Concepts
How crawling, scraping, and delivery fit together — and the knobs you'll reach for most often.
Crawling vs. scraping
Spider supports two core operations. Scraping fetches a single page and returns its content. Crawling starts from a URL and follows links to discover and fetch multiple pages across a site. Both accept the same parameters for output format, proxy usage, and request mode. See Scraping and Crawlingfor endpoint specifics.
Request modes
Every request uses one of three modes. The default smart inspects each page and picks between a lightweight HTTP fetch and a full Chrome browser based on what the page actually needs.
| Mode | When to use | Speed | Cost | JS rendering |
|---|---|---|---|---|
smart | Default. Works for most sites. | Fast | Low – medium | Auto-detected |
http | Static HTML, APIs, known simple pages. | Fastest | Lowest | No |
chrome | SPAs, JS-rendered content, bot-protected sites. | Slower | Higher | Yes |
Concurrent crawling
The Rust engine runs crawls with full concurrency. Pages are fetched, rendered, and processed in parallel, so a 500-page crawl doesn't take 500× longer than a single page. Concurrency is managed server-side — no thread pools or connection limits to wire up. For large jobs, pair concurrency with streamingso you can process pages the moment they arrive.
Output formats
The return_format parameter controls how Spider delivers page content. Markdown is the default for AI workloads — structure preserved, navigation and ads stripped, clean LLM context at a fraction of the token cost of raw HTML.
| Format | What you get | Best for |
|---|---|---|
raw | Original HTML as returned by the server. | Parsing with your own tools, archiving. |
markdown | Clean text with structure preserved. Navigation, scripts, and boilerplate stripped. | LLMs, RAG pipelines, content analysis. |
text | Plain text without any markup. | Simple text extraction, word counts. |
bytes | Binary data for non-HTML resources. | PDFs, images, file downloads. |
Streaming
With streaming on, Spider returns each page as a JSON line the moment it finishes — no buffering the full result set. Lower memory, no HTTP timeouts, faster time to first result. See Concurrent Streamingfor full examples.
Screenshots
The /screenshot endpoint captures full-page or viewport-sized images as PNG, JPEG, or WebP, returned as base64 or raw binary. Useful for visual regression tests, archiving page appearances, or pairing visual context with extracted text. Always uses Chrome rendering, so JavaScript-heavy pages render correctly.
AI extraction
Spider can pull structured data from pages with AI. With an AI Studiosubscription, describe the fields you want and Spider returns structured JSON instead of raw content. Good for product details, contact info, or any repeatable shape — no CSS selectors required. See JSON Scrapingfor the parameter reference.
Credits
Usage is measured in credits at $1 per 10,000 credits. Each crawled page has a base cost; Chrome rendering, proxy usage, and AI extraction add on top. Failed requests, timeouts, and blocked pages cost zero. Every response includes a costs field with a per-request breakdown — view live balance and history on the usage page.