Guides / Spider API

Spider API

An overview of Spider's API capabilities, endpoints, request modes, output formats, and how to get started.

3 min read Jeff Mendez

Spider API Overview

What the API Does

Spider’s API turns any URL into structured data. You send a URL and configuration, and get back page content in the format you need: HTML, markdown, plain text, or structured JSON. The API handles proxy rotation, JavaScript rendering, rate limiting, and anti-bot detection on your behalf.

Core Capabilities

  • Proxy rotation: Datacenter, residential, and mobile proxies with automatic failover
  • Full concurrency: Crawl thousands of pages per minute using the Rust-based engine
  • Smart mode: Automatically picks HTTP or Chrome rendering based on each page’s needs
  • Caching: Repeated crawls of the same pages return faster
  • Output formats: Markdown, HTML, plain text, or raw bytes
  • AI extraction: Pull structured fields from pages using built-in LLM integration
  • Anti-bot measures: Browser fingerprinting and proxy rotation to reduce blocks

See the full API reference for every endpoint and parameter.

Endpoints

EndpointPurpose
/crawlStart from a URL, follow links, return content for each page
/scrapeFetch a single URL and return its content
/searchSearch the web and optionally scrape the results
/screenshotCapture a full-page screenshot as base64 PNG
/linksGet the link graph for a URL
/pipeline/extract-contactsExtract contact information from pages

Quick Start

import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/json',
}

# Crawl a site and get markdown
response = requests.post('https://api.spider.cloud/crawl',
  headers=headers,
  json={
    "url": "https://example.com",
    "limit": 10,
    "return_format": "markdown",
    "request": "smart"
  }
)

for page in response.json():
    print(f"{page['url']}{len(page['content'])} chars")

Request Modes

The request parameter controls how Spider fetches each page:

ModeWhen to use
smart (default)Automatically picks HTTP or Chrome based on page requirements
httpStatic pages, sitemaps, APIs. Fastest and cheapest
chromeSPAs, JS-rendered content, pages behind Cloudflare or similar protections

Output Formats

Set return_format to control what you get back:

FormatOutputBest for
rawOriginal HTMLParsing with your own tools
markdownClean markdownLLM ingestion, RAG pipelines
textPlain textSearch indexing, NLP tasks
bytesRaw bytesBinary content, downloads

For AI workflows, markdown strips navigation, ads, and boilerplate, giving you just the page content.

Streaming Large Crawls

For crawls over a few dozen pages, use streaming to process results as they arrive:

import requests, json, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/jsonl',
}

response = requests.post('https://api.spider.cloud/crawl',
  headers=headers,
  json={
    "url": "https://example.com",
    "limit": 250,
    "return_format": "markdown"
  },
  stream=True
)

for line in response.iter_lines(decode_unicode=True):
    if line:
        page = json.loads(line)
        print(f"Crawled: {page['url']} ({page['status']})")

Set Content-Type to application/jsonl and enable stream=True in your HTTP client. See the streaming docs for more details.

SDK Libraries

Official SDKs handle authentication and provide typed methods for all endpoints:

Spider also integrates with LangChain, LlamaIndex, CrewAI, and other AI frameworks as a document loader.

Getting Set Up

  1. Register or sign in.
  2. Purchase credits. Pay-as-you-go at $1 per 10,000 credits.
  3. Create an API key.
  4. Start making requests.

Track your usage on the usage page.

Empower any project with AI-ready data

Join thousands of developers using Spider to power their data pipelines.