# Spider

> The fastest web crawling and scraping API for AI agents, RAG pipelines, and LLMs.
> Crawl 100K+ pages per second. Get clean markdown, structured JSON, and AI-ready data.

- Website: https://spider.cloud
- API Base URL: https://api.spider.cloud
- Documentation: https://spider.cloud/docs/overview
- API Reference: https://spider.cloud/docs/api
- OpenAPI Spec: https://spider.cloud/openapi.yaml
- Pricing: https://spider.cloud/credits/new
- MCP Server: https://www.npmjs.com/package/spider-cloud-mcp (npx spider-cloud-mcp)
- Open Source: https://github.com/spider-rs/spider (MIT License)
- Discord: https://discord.spider.cloud
- Support: support@spider.cloud

## What Spider Does

Spider is a web crawling and scraping API built for developers building AI applications. It converts any website into clean, structured data optimized for LLMs, RAG pipelines, and AI agent workflows.

Key capabilities:
- Crawl entire websites at 100K+ pages per second
- Output formats: Markdown, HTML, JSON, JSONL, CSV, XML, plain text
- JavaScript rendering with headless Chrome and Firefox
- AI-powered natural language extraction (no CSS selectors needed)
- Real-time web search API
- Screenshot capture
- Anti-bot bypass and proxy rotation
- Webhook and cron scheduling
- Pay-per-use pricing (no subscriptions required)

## Getting Started

- Quickstart: https://spider.cloud/docs/quickstart
- Get API Key: https://spider.cloud/api-keys
- Interactive Playground: https://spider.cloud/playground
- Concepts: https://spider.cloud/docs/concepts

## API Endpoints

### Core Endpoints

- POST /crawl - Crawl websites and extract content from multiple pages
- POST /scrape - Scrape a single page
- POST /search - Search the web, optionally crawl results
- POST /links - Extract all links from a page
- POST /screenshot - Capture page screenshots
- POST /unblocker - Access bot-protected content
- POST /transform - Convert HTML to markdown/text/other formats
- GET /data/credits - Check available credits

### AI Endpoints (Subscription Required)

Natural language web data extraction — describe what you need in plain English:

- POST /ai/crawl - AI-guided crawling with natural language prompts
- POST /ai/scrape - Extract structured data using plain English
- POST /ai/search - AI-enhanced semantic web search
- POST /ai/browser - Automate browser interactions with natural language
- POST /ai/links - Intelligent link extraction and filtering

AI pricing: https://spider.cloud/ai/pricing

## Authentication

All requests require a Bearer token:

```
Authorization: Bearer YOUR_API_KEY
```

## SDKs & Libraries

- Python: https://spider.cloud/docs/libraries
- Node.js/JavaScript: https://spider.cloud/docs/libraries
- Rust: https://spider.cloud/docs/libraries
- Go: https://spider.cloud/docs/libraries

## Integrations

Spider integrates with popular AI frameworks:

- LangChain: https://spider.cloud/docs/integrations/langchain
- LlamaIndex: https://spider.cloud/docs/integrations/llamaindex
- CrewAI: https://spider.cloud/docs/integrations/crewai
- Agno: https://spider.cloud/docs/integrations/agno
- FlowiseAI: https://spider.cloud/docs/integrations/flowiseai
- Zapier: https://spider.cloud/docs/integrations/zapier
- x402 (Crypto Payments): https://spider.cloud/docs/integrations/x402

## Use Cases

- RAG Pipelines: https://spider.cloud/use-cases/rag
- AI Training Data: https://spider.cloud/use-cases/ai-training
- Price Monitoring: https://spider.cloud/use-cases/price-monitoring
- SEO Tracking: https://spider.cloud/use-cases/seo-tracking
- Content Aggregation: https://spider.cloud/use-cases/content-aggregation
- Lead Generation: https://spider.cloud/use-cases/lead-generation
- Market Research: https://spider.cloud/use-cases/market-research
- Website Archiving: https://spider.cloud/use-cases/website-archiving

## Guides

- Scraping & Crawling: https://spider.cloud/docs/core/scraping-crawling
- Real-time Search: https://spider.cloud/docs/core/realtime-search
- Efficient Scraping: https://spider.cloud/docs/core/efficient-scraping
- Concurrent Streaming: https://spider.cloud/docs/core/concurrent-streaming
- JSON Scraping: https://spider.cloud/docs/advanced/json-scraping
- Webhooks: https://spider.cloud/docs/core/webhooks
- Data Connectors: https://spider.cloud/docs/core/data-connectors
- Error Codes: https://spider.cloud/docs/core/error-codes
- Rate Limits: https://spider.cloud/docs/core/rate-limits
- Use Case Guides: https://spider.cloud/docs/guides/use-cases
- Recipes: https://spider.cloud/docs/guides/recipes

## Quick Example

```python
import requests

response = requests.post(
    "https://api.spider.cloud/crawl",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://example.com",
        "limit": 10,
        "return_format": "markdown"
    }
)

for page in response.json():
    print(page["url"], len(page["content"]), "chars")
```

```javascript
const response = await fetch("https://api.spider.cloud/crawl", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com",
    limit: 10,
    return_format: "markdown"
  })
});

const pages = await response.json();
```

```bash
curl -X POST https://api.spider.cloud/crawl \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "limit": 10, "return_format": "markdown"}'
```