Crawl any website. One call.
Pass a URL. Get HTML, Markdown, or JSON back from every reachable page. Proxies, JS rendering, and anti-bot are handled for you.
- Pages per second
- 100k+
- Success rate
- 99.9%
- Monthly minimum
- $0
URL in, structured data out.
Submit a URL
Send a target URL with depth, page limit, and output format.
Spider walks the site
Follows links, renders JS, rotates proxies, handles anti-bot.
Clean data back
Receive HTML, Markdown, plain text, JSON, screenshots, or PDF — stream or batch.
Three rendering strategies. Switch per request.
HTTP
Direct fetch, no browser. Static sites, blogs, docs.
~50ms / pageSmart
DefaultDetects when JS rendering is needed. Browser only when it has to be.
~200ms / pageBrowser
Full headless Chrome. SPAs, lazy loading, infinite scroll.
~1s / pageSame payload in any language.
curl -X POST https://api.spider.cloud/crawl \
-H "Authorization: Bearer $SPIDER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"limit": 50,
"return_format": "markdown"
}'Switch the format with one parameter.
HTML
Raw or cleaned, with optional tag filtering
Markdown
Clean Markdown for LLMs and RAG pipelines
Plain text
Stripped text content, no markup
JSON
Typed extraction via AI-powered schemas
Screenshot
Full-page PNG or viewport capture
Browser-rendered PDF of any page
Infrastructure that handles millions of pages.
Elastic concurrency
Auto-scales concurrent connections to match crawl volume. 10 pages to 10 million, same code.
Proxy rotation
Automatic IP rotation across residential and datacenter pools. Spider selects the proxy type per domain.
Anti-bot bypass
Cloudflare, Akamai, PerimeterX, and more. Fingerprints rotate per request.
HTTP caching
Previously crawled pages are cached and served instantly on repeat requests.
Webhooks
Fires as pages are discovered. Process the stream — don't wait for the whole crawl.
What you'll want to know.
What is web crawling?
Automated discovery of pages by following links. A crawler starts from one or more seed URLs, fetches the page, extracts links, and repeats. Spider does the full loop behind one API call — you pass a URL, you get structured data back from every reachable page.
Is web scraping legal?
Scraping publicly available data is generally legal in the United States (hiQ Labs v. LinkedIn). Respect robots.txt, terms of service, and rate limits; do not scrape personal data without consent. Spider honours robots.txt by default and includes built-in rate limiting.
How many pages can Spider crawl?
No hard limit. The infrastructure auto-scales from a single page to millions per job. Throughput depends on your plan and concurrency; enterprise users reach 500+ pages per second.
What's the difference between crawling and scraping?
Crawling is discovery — following links to find pages. Scraping is extraction — pulling specific data from those pages. Spider does both: it walks the target site and returns clean output (HTML, Markdown, plain text, JSON, screenshot, or PDF) per page.
Does Spider handle JavaScript-rendered pages?
Yes. Three modes: HTTP (fast static fetch), Smart (auto-detects whether JS rendering is needed), and Browser (full headless Chrome for SPAs, lazy loading, and interactions).