Skip to main content New gottem — one API for every web scraping vendor.
Crawl & Scrape API

Crawl any website. One call.

Pass a URL. Get HTML, Markdown, or JSON back from every reachable page. Proxies, JS rendering, and anti-bot are handled for you.

POST /crawl Live
example.com
├── /home 200 · 47KB · md
├── /products 200 · 82KB · md
│ ├── /a 200 · 31KB · md
│ └── /b 200 · 29KB · md
└── /blog 200 · 19KB · md
crawl complete · 5/5 OK · 2.1s · $0.05
Pages per second
100k+
Success rate
99.9%
Monthly minimum
$0
01 · How it works

URL in, structured data out.

01

Submit a URL

Send a target URL with depth, page limit, and output format.

02

Spider walks the site

Follows links, renders JS, rotates proxies, handles anti-bot.

03

Clean data back

Receive HTML, Markdown, plain text, JSON, screenshots, or PDF — stream or batch.

02 · Crawl modes

Three rendering strategies. Switch per request.

HTTP

Direct fetch, no browser. Static sites, blogs, docs.

~50ms / page

Smart

Default

Detects when JS rendering is needed. Browser only when it has to be.

~200ms / page

Browser

Full headless Chrome. SPAs, lazy loading, infinite scroll.

~1s / page
03 · Single call

Same payload in any language.

/ / POST /crawl
curl -X POST https://api.spider.cloud/crawl \
  -H "Authorization: Bearer $SPIDER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 50,
    "return_format": "markdown"
  }'
04 · Output formats

Switch the format with one parameter.

HTML

Raw or cleaned, with optional tag filtering

Markdown

Clean Markdown for LLMs and RAG pipelines

Plain text

Stripped text content, no markup

JSON

Typed extraction via AI-powered schemas

Screenshot

Full-page PNG or viewport capture

PDF

Browser-rendered PDF of any page

05 · Built for scale

Infrastructure that handles millions of pages.

Elastic concurrency

Auto-scales concurrent connections to match crawl volume. 10 pages to 10 million, same code.

Proxy rotation

Automatic IP rotation across residential and datacenter pools. Spider selects the proxy type per domain.

Anti-bot bypass

Cloudflare, Akamai, PerimeterX, and more. Fingerprints rotate per request.

HTTP caching

Previously crawled pages are cached and served instantly on repeat requests.

Webhooks

Fires as pages are discovered. Process the stream — don't wait for the whole crawl.

06 · Common questions

What you'll want to know.

What is web crawling?

Automated discovery of pages by following links. A crawler starts from one or more seed URLs, fetches the page, extracts links, and repeats. Spider does the full loop behind one API call — you pass a URL, you get structured data back from every reachable page.

Is web scraping legal?

Scraping publicly available data is generally legal in the United States (hiQ Labs v. LinkedIn). Respect robots.txt, terms of service, and rate limits; do not scrape personal data without consent. Spider honours robots.txt by default and includes built-in rate limiting.

How many pages can Spider crawl?

No hard limit. The infrastructure auto-scales from a single page to millions per job. Throughput depends on your plan and concurrency; enterprise users reach 500+ pages per second.

What's the difference between crawling and scraping?

Crawling is discovery — following links to find pages. Scraping is extraction — pulling specific data from those pages. Spider does both: it walks the target site and returns clean output (HTML, Markdown, plain text, JSON, screenshot, or PDF) per page.

Does Spider handle JavaScript-rendered pages?

Yes. Three modes: HTTP (fast static fetch), Smart (auto-detects whether JS rendering is needed), and Browser (full headless Chrome for SPAs, lazy loading, and interactions).