Skip to main content

Overview

One API for crawling, scraping, search, screenshots, and AI extraction. Bearer auth, JSON in, JSON out, optional streaming.

QuickstartDeveloper quickstartInstall the SDK, export your API key, and make your first request.3 mins
Request
import requests, json

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"url":"https://example.com"}

response = requests.post('https://api.spider.cloud/crawl', 
  headers=headers, json=json_data, stream=True)

with response as r:
    r.raise_for_status()
    
    buffer = b""

    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            buffer += chunk
            try:
                data = json.loads(buffer.decode('utf-8'))
                print(data)
                buffer = b""
            except json.JSONDecodeError:
                continue
CrawlFollow links across entire sites. Set depth, limit, and domain scope.
ScrapeFetch a single page as HTML, markdown, text, or structured JSON.
SearchSearch the web and scrape the results in one request.
ScreenshotCapture full-page screenshots with Chrome rendering.
StreamingProcess pages as they finish instead of waiting for the full result.
AI ExtractionExtract structured data from any page using AI or CSS selectors.
Data ConnectorsStream results directly to S3, Google Cloud, Azure Blob, Sheets, or Supabase.
Anti-Bot BypassAutomatic fingerprint rotation, stealth mode, and retry engine to bypass bot protection.
Proxy ModeGeo-routing across residential, ISP, and mobile proxies in 100+ countries.
Browser CloudFull cloud browsers via CDP WebSocket. Playwright and Puppeteer compatible with stealth, proxies, and recording.
Fetch API (Alpha)AI-configured per-website scrapers. Discovers selectors automatically, caches and reuses.

How it works

Every request runs three stages concurrently. A 500-page crawl typically completes in under 20 seconds. Every response includes a costs object and timing data so you can benchmark it yourself.

01FetchRetrieve the page with HTTP or a headless Chrome session.
02ProcessRender JavaScript, rotate proxies, handle anti-bot challenges.
03DeliverConvert to markdown, HTML, JSON, or bytes and stream back.

API Endpoints

All endpoints accept JSON and return JSON. Authenticate with a Bearer token.

MethodPathDescription
POST/crawlStart from a URL and follow links to discover and fetch multiple pages.
POST/scrapeFetch a single page and return its content in any format.
POST/searchSearch the web and optionally scrape the results.
POST/screenshotCapture a full-page screenshot as base64 PNG.
POST/fetch/{domain}/{path}AI-configured per-website scraper with cached configs. (Alpha)
GET/data/scraper-directoryBrowse optimized scraper configs for popular websites.
HTTPproxy.spider.cloudRoute requests through intelligent residential, ISP, or mobile proxies.
WSbrowser.spider.cloudConnect a Playwright or Puppeteer client to a cloud browser via CDP.

Request modes

Pick how Spider fetches each page. See Conceptsfor the trade-offs.

smartDefault
Automatically picks between HTTP and Chrome based on the page.
httpFast
Static HTML only. The fastest and cheapest mode.
chromeJS / SPA
Full browser rendering. Use for SPAs or bot-protected sites.

Proxy mode

Route any Spider request through proxy.spider.cloud and Spider picks the pool, rotates IPs, and handles geo-routing. Use country_code to target a country and proxy to pick a pool. Works with Crawl, Scrape, Screenshot, Search, and Links. See the Proxy API referencefor pricing.

residentialReal-user IPs across 100+ countries.
ispStable datacenter IPs — highest throughput.
mobileReal 4G / 5G device IPs.

Browser cloud

Full cloud browsers over a CDP WebSocket at wss://browser.spider.cloud/v1/browser?token=YOUR-API-KEY. Connect Playwright or Puppeteer with connectOverCDP(). Sessions include built-in stealth, proxy rotation, and optional session recording. 100 concurrent browsers on every plan. See the Browser API referenceor grab the spider-browsernpm package.

Credits

Usage is measured in credits at $1 per 10,000 credits. Each page has a base cost; Chrome rendering, proxy usage, and AI extraction add on top. Failed requests, timeouts, and blocked pages cost zero. Every response includes a costs object with a per-request breakdown — see the live balance on the usage page.

Guides