Crawl and scrape any website
One API call. Get back HTML, Markdown, or JSON from every page on a site. Proxies, rendering, and anti-bot detection are all handled for you.
How it works
Three steps from URL to structured data. No infrastructure to manage.
1. Submit a URL
Send a target URL to the API with your crawl depth, page limit, and output format.
2. Spider crawls the site
Spider follows links, renders JavaScript, rotates proxies, and handles anti-bot protections automatically.
3. Get clean data back
Receive structured content as HTML, Markdown, plain text, JSON, screenshots, or PDF. Stream or batch.
Crawl modes
Pick the rendering strategy that fits your target. Switch per-request with a single parameter.
HTTP
Direct HTTP fetching without a browser. Ideal for static sites, blogs, and documentation.
~50ms per pageSmart
Auto-detects whether a page needs JavaScript rendering. Uses a browser only when necessary.
~200ms per pageBrowser
Full headless Chrome with JavaScript execution. Handles SPAs, lazy-loaded content, and infinite scroll.
~1s per pageStart in minutes
A single API call. Pick your language.
curl -X POST https://api.spider.cloud/crawl \
-H "Authorization: Bearer $SPIDER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"limit": 50,
"return_format": "markdown"
}' Output formats
One crawl, multiple formats. Switch with a single parameter.
HTML
Raw or cleaned HTML with optional tag filtering
Markdown
Clean Markdown ideal for LLMs and RAG pipelines
Plain Text
Stripped text content with no markup
JSON
Structured extraction via AI-powered schemas
Screenshot
Full-page PNG or viewport captures
Browser-rendered PDF exports of any page
Built for scale
Infrastructure that handles millions of pages.
Elastic concurrency
Auto-scales concurrent connections to match your crawl volume. 10 pages to 10 million, same code.
Proxy rotation
Automatic IP rotation across residential and datacenter proxies. Spider selects the optimal proxy type per domain.
Anti-bot bypass
Handles Cloudflare, Akamai, PerimeterX, and other detection systems. Fingerprints rotate per request.
HTTP caching
Previously crawled pages are cached and served instantly on repeat requests.
Webhooks
Get notified when crawls complete. Process pages as they're discovered instead of waiting for the full crawl.
FAQ
What is web crawling?
Web crawling is the automated process of discovering and fetching web pages by following links. A web crawler starts from one or more seed URLs, downloads the page content, extracts links, and repeats the process. Spider handles this entire workflow through a single API call. You provide a URL and Spider returns structured data from every reachable page.
Is web scraping legal?
Web scraping publicly available data is generally legal in the United States, supported by the hiQ Labs v. LinkedIn ruling. However, you should always respect robots.txt directives, terms of service, and avoid scraping personal data without consent. Spider provides built-in robots.txt compliance and rate limiting to help you crawl responsibly.
How many pages can Spider crawl?
Spider can crawl millions of pages per job with no hard limit. Our infrastructure auto-scales to handle crawls of any size, from a single page to an entire domain with hundreds of thousands of URLs. Crawl speed depends on your plan and concurrency settings, with enterprise users reaching 500+ pages per second.
What's the difference between crawling and scraping?
Web crawling is about discovery: following links to find pages across a site. Web scraping is about extraction: pulling specific data from those pages. Spider does both. It crawls your target site to discover all pages, then scrapes each page to return clean, structured data in your chosen format (HTML, Markdown, JSON, and more).
Does Spider handle JavaScript-rendered pages?
Yes. Spider offers three rendering modes: HTTP mode for fast static page fetching, Smart mode that auto-detects whether JavaScript rendering is needed, and Browser mode that uses a full headless browser to render JavaScript-heavy single-page applications. Browser mode supports the same interactions a real user would perform.
Start collecting web data
Free credits on signup, no card required. Crawl your first site in under a minute.