A practical walkthrough for collecting web data with Spider, from your first crawl to production pipelines.
Overview
One API for crawling, scraping, search, screenshots, and AI extraction. Bearer auth, JSON in, JSON out, optional streaming.
How it works
Every request runs three stages concurrently. A 500-page crawl typically completes in under 20 seconds. Every response includes a costs object and timing data so you can benchmark it yourself.
API Endpoints
All endpoints accept JSON and return JSON. Authenticate with a Bearer token.
| Method | Path | Description |
|---|---|---|
POST | /crawl | Start from a URL and follow links to discover and fetch multiple pages. |
POST | /scrape | Fetch a single page and return its content in any format. |
POST | /search | Search the web and optionally scrape the results. |
POST | /screenshot | Capture a full-page screenshot as base64 PNG. |
POST | /fetch/{domain}/{path} | AI-configured per-website scraper with cached configs. (Alpha) |
GET | /data/scraper-directory | Browse optimized scraper configs for popular websites. |
HTTP | proxy.spider.cloud | Route requests through intelligent residential, ISP, or mobile proxies. |
WS | browser.spider.cloud | Connect a Playwright or Puppeteer client to a cloud browser via CDP. |
Request modes
Pick how Spider fetches each page. See Conceptsfor the trade-offs.
smartDefaulthttpFastchromeJS / SPAProxy mode
Route any Spider request through proxy.spider.cloud and Spider picks the pool, rotates IPs, and handles geo-routing. Use country_code to target a country and proxy to pick a pool. Works with Crawl, Scrape, Screenshot, Search, and Links. See the Proxy API referencefor pricing.
residentialReal-user IPs across 100+ countries.ispStable datacenter IPs — highest throughput.mobileReal 4G / 5G device IPs.Browser cloud
Full cloud browsers over a CDP WebSocket at wss://browser.spider.cloud/v1/browser?token=YOUR-API-KEY. Connect Playwright or Puppeteer with connectOverCDP(). Sessions include built-in stealth, proxy rotation, and optional session recording. 100 concurrent browsers on every plan. See the Browser API referenceor grab the spider-browsernpm package.
Credits
Usage is measured in credits at $1 per 10,000 credits. Each page has a base cost; Chrome rendering, proxy usage, and AI extraction add on top. Failed requests, timeouts, and blocked pages cost zero. Every response includes a costs object with a per-request breakdown — see the live balance on the usage page.
An overview of Spider's API capabilities, endpoints, request modes, output formats, and how to get started.
Extract contact information from any website using Spider's AI-powered pipeline. Emails, phone numbers, and more.
Archive web pages with Spider. Capture full page resources, automate regular crawls, and store content for long-term access.
Crawl multiple URLs with Spider's LangChain loader, then summarize the results with Groq and Llama 3.
Build a crewAI research pipeline that uses Spider to scrape financial data and write stock analysis reports.
Extract company info from inbound emails, scrape their website with Spider, and generate personalized replies with RAG.
Set up an Autogen agent that scrapes and crawls websites using the Spider API.
Route requests through Spider's proxy front-end for easy integration with third-party tools.
Two methods for crawling pages behind login walls: cookies and execution scripts.
Scaling web scraping for RAG pipelines. Error-first design, retry strategies, and handling failures at volume.
Choosing your scraper, cleaning HTML for RAG, deduplicating content, and testing on a single site before scaling up.
Add full-text static search to any website using Spider and Pagefind.
Build a research agent that searches the web with Spider, evaluates results, and forms answers with OpenAI.
Set up Spider Bot on your Discord server to fetch and analyze web data using slash commands.
Practical strategies for scaling headless Chrome, from container orchestration to Rust-based CDP handlers and ALB configuration.
Search the web and optionally scrape results in a single API call. Built for LLM pipelines, agents, and data collection.
A guide to all open source Spider projects: the core crawler, browser client, HTML transformer, TLS fingerprinting, and more. Quick-start examples for each.
Build your own web crawler with the open source spider Rust crate. Quick start, Docker setup, configuration, and when to upgrade to the cloud API.
A clear breakdown of Spider's pay-as-you-go pricing, per-endpoint costs, volume discounts, and how billing actually works for web scraping and browser automation.