# Spider > The fastest web crawling and scraping API for AI agents, RAG pipelines, and LLMs. > Crawl 100K+ pages per second. Get clean markdown, structured JSON, and AI-ready data. - Website: https://spider.cloud - API Base URL: https://api.spider.cloud - Documentation: https://spider.cloud/docs/overview - API Reference: https://spider.cloud/docs/api - OpenAPI Spec: https://spider.cloud/openapi.yaml - Pricing: https://spider.cloud/credits/new - MCP Server: https://www.npmjs.com/package/spider-cloud-mcp (npx spider-cloud-mcp) - Open Source: https://github.com/spider-rs/spider (MIT License) - Discord: https://discord.spider.cloud - Support: support@spider.cloud ## What Spider Does Spider is a web crawling and scraping API built for developers building AI applications. It converts any website into clean, structured data optimized for LLMs, RAG pipelines, and AI agent workflows. Key capabilities: - Crawl entire websites at 100K+ pages per second - Output formats: Markdown, HTML, JSON, JSONL, CSV, XML, plain text - JavaScript rendering with headless Chrome and Firefox - AI-powered natural language extraction (no CSS selectors needed) - Real-time web search API - Screenshot capture - Anti-bot bypass and proxy rotation - Webhook and cron scheduling - Pay-per-use pricing (no subscriptions required) ## Getting Started - Quickstart: https://spider.cloud/docs/quickstart - Get API Key: https://spider.cloud/api-keys - Interactive Playground: https://spider.cloud/playground - Concepts: https://spider.cloud/docs/concepts ## API Endpoints ### Core Endpoints - POST /crawl - Crawl websites and extract content from multiple pages - POST /scrape - Scrape a single page - POST /search - Search the web, optionally crawl results - POST /links - Extract all links from a page - POST /screenshot - Capture page screenshots - POST /unblocker - Access bot-protected content - POST /transform - Convert HTML to markdown/text/other formats - GET /data/credits - Check available credits ### AI Endpoints (Subscription Required) Natural language web data extraction — describe what you need in plain English: - POST /ai/crawl - AI-guided crawling with natural language prompts - POST /ai/scrape - Extract structured data using plain English - POST /ai/search - AI-enhanced semantic web search - POST /ai/browser - Automate browser interactions with natural language - POST /ai/links - Intelligent link extraction and filtering AI pricing: https://spider.cloud/ai/pricing ## Authentication All requests require a Bearer token: ``` Authorization: Bearer YOUR_API_KEY ``` ## Key Request Parameters Most endpoints accept these alongside `url` / `query`: - `request` - `"http"` (fastest, no JS), `"chrome"` (headless browser), or `"smart_mode"` (HTTP first, escalates to Chrome only when the page needs it) - `limit` - max pages for `/crawl` - `return_format` - `"markdown"` (default for LLMs), `"raw"`, `"text"`, `"html"`, `"bytes"`, `"commonmark"` - `remote_proxy` - route through your own proxy, e.g. `"http://user:pass@host:port"` - `css_extraction_map` - object keyed by URL-path pattern (`"/"` matches everything). Each value is an array of `{ "name": "", "selectors": ["", ...] }` entries. List multiple selectors as fallbacks. Results returned under `css_extracted`. - `wait_for` - object controlling when Chrome considers a page "ready". Sub-keys: `selector` (`{selector, timeout}`), `idle_network` (`{timeout}`), `idle_network0`, `almost_idle_network0`, `dom` (`{selector, timeout}`), `delay` (`{timeout}`), `page_navigations` (boolean). Each `timeout` is a Rust `Duration`: `{ "secs": , "nanos": }`. Only applies in `chrome` / `smart_mode`. Example: ```json { "url": "https://store.example.com/product/123", "request": "chrome", "wait_for": { "selector": { "selector": "h1.product-title", "timeout": { "secs": 10, "nanos": 0 } }, "idle_network": { "timeout": { "secs": 5, "nanos": 0 } } }, "css_extraction_map": { "/": [ { "name": "title", "selectors": ["h1.product-title"] }, { "name": "price", "selectors": [".price-value"] }, { "name": "images", "selectors": ["img.hero-image", "img.gallery"] } ] } } ``` ## SDKs & Libraries - Python: https://spider.cloud/docs/libraries - Node.js/JavaScript: https://spider.cloud/docs/libraries - Rust: https://spider.cloud/docs/libraries - Go: https://spider.cloud/docs/libraries ## Integrations Spider integrates with popular AI frameworks: - LangChain: https://spider.cloud/docs/integrations/langchain - LlamaIndex: https://spider.cloud/docs/integrations/llamaindex - CrewAI: https://spider.cloud/docs/integrations/crewai - Agno: https://spider.cloud/docs/integrations/agno - FlowiseAI: https://spider.cloud/docs/integrations/flowiseai - Zapier: https://spider.cloud/docs/integrations/zapier - x402 (Crypto Payments): https://spider.cloud/docs/integrations/x402 ## Use Cases - RAG Pipelines: https://spider.cloud/use-cases/rag - AI Training Data: https://spider.cloud/use-cases/ai-training - Price Monitoring: https://spider.cloud/use-cases/price-monitoring - SEO Tracking: https://spider.cloud/use-cases/seo-tracking - Content Aggregation: https://spider.cloud/use-cases/content-aggregation - Lead Generation: https://spider.cloud/use-cases/lead-generation - Market Research: https://spider.cloud/use-cases/market-research - Website Archiving: https://spider.cloud/use-cases/website-archiving ## Guides - Scraping & Crawling: https://spider.cloud/docs/core/scraping-crawling - Real-time Search: https://spider.cloud/docs/core/realtime-search - Efficient Scraping: https://spider.cloud/docs/core/efficient-scraping - Concurrent Streaming: https://spider.cloud/docs/core/concurrent-streaming - JSON Scraping: https://spider.cloud/docs/advanced/json-scraping - Webhooks: https://spider.cloud/docs/core/webhooks - Data Connectors: https://spider.cloud/docs/core/data-connectors - Error Codes: https://spider.cloud/docs/core/error-codes - Rate Limits: https://spider.cloud/docs/core/rate-limits - Use Case Guides: https://spider.cloud/docs/guides/use-cases - Recipes: https://spider.cloud/docs/guides/recipes ## Quick Example ```python import requests response = requests.post( "https://api.spider.cloud/crawl", headers={"Authorization": "Bearer YOUR_API_KEY"}, json={ "url": "https://example.com", "limit": 10, "return_format": "markdown" } ) for page in response.json(): print(page["url"], len(page["content"]), "chars") ``` ```javascript const response = await fetch("https://api.spider.cloud/crawl", { method: "POST", headers: { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" }, body: JSON.stringify({ url: "https://example.com", limit: 10, return_format: "markdown" }) }); const pages = await response.json(); ``` ```bash curl -X POST https://api.spider.cloud/crawl \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "limit": 10, "return_format": "markdown"}' ```