A practical walkthrough for collecting web data with Spider, from your first crawl to production pipelines.
Spider Developer Platform
Collect web data at scale. Spider handles crawling, rendering, proxy rotation, and anti-bot evasion, you get clean data back through a single API.
How It Works
Every request goes through three stages: fetch (retrieve the page using HTTP or headless Chrome), process (render JavaScript, rotate proxies, handle anti-bot challenges), and deliver (convert to your chosen format and return). Spider's Rust-based engine runs all stages concurrently, so a 500-page crawl takes seconds, not hours.
API Endpoints
All endpoints accept JSON and return JSON. Authenticate with a Bearer token.
| Method | Path | Description |
|---|---|---|
POST | /crawl | Start from a URL and follow links to discover and fetch multiple pages. |
POST | /scrape | Fetch a single page and return its content in any format. |
POST | /search | Search the web and optionally scrape the results. |
POST | /screenshot | Capture a full-page screenshot as base64 PNG. |
POST | /fetch/{domain}/{path} | AI-configured per-website scraper with cached configs. (Alpha) |
GET | /data/scraper-directory | Browse optimized scraper configs for popular websites. |
Request Modes
Choose how Spider fetches each page. smart (default) automatically picks between HTTP and Chrome based on the page. Use http for static HTML, it is the fastest and cheapest. Use chrome when you need JavaScript rendering, SPA support, or real browser fingerprints for bot-protected sites. See Concepts for details.
Credits
Usage is measured in credits at $1 / 10,000 credits. Each page costs a base amount, with additional credits for Chrome rendering, proxy usage, and AI extraction. Every response includes a costs object with a per-request breakdown. Monitor your balance on the usage page.
Explore our guides
-
-
An overview of Spider's API capabilities, endpoints, request modes, output formats, and how to get started.
-
Extract contact information from any website using Spider's AI-powered pipeline. Emails, phone numbers, and more.
-
Archive web pages with Spider. Capture full page resources, automate regular crawls, and store content for long-term access.
-
Crawl multiple URLs with Spider's LangChain loader, then summarize the results with Groq and Llama 3.
-
Build a crewAI research pipeline that uses Spider to scrape financial data and write stock analysis reports.
-
Extract company info from inbound emails, scrape their website with Spider, and generate personalized replies with RAG.
-
Set up an Autogen agent that scrapes and crawls websites using the Spider API.
-
Route requests through Spider's proxy front-end for easy integration with third-party tools.
-
Three methods for crawling pages behind login walls: cookies, execution scripts, and AI-driven actions.
-
Scaling web scraping for RAG pipelines. Error-first design, retry strategies, and handling failures at volume.
-
Choosing your scraper, cleaning HTML for RAG, deduplicating content, and testing on a single site before scaling up.
-
Add full-text static search to any website using Spider and Pagefind.
-
Build a research agent that searches the web with Spider, evaluates results, and forms answers with OpenAI.
-
Set up Spider Bot on your Discord server to fetch and analyze web data using slash commands.
-
Practical strategies for scaling headless Chrome, from container orchestration to Rust-based CDP handlers and ALB configuration.
-
Search the web and optionally scrape results in a single API call. Built for LLM pipelines, agents, and data collection.