NEW AI Studio is now available Try it now

Crawl and scrape any website

One API call. Get back HTML, Markdown, or JSON from every page on a site. Proxies, rendering, and anti-bot detection are all handled for you.

api.spider.cloud
POST /crawl
url: example.com · limit: 100 · format: markdown
100 pages · 2.1s · $0.10
100k +
Pages per second
99.9 %
Success rate
$0
Monthly minimum

How it works

Three steps from URL to structured data. No infrastructure to manage.

1. Submit a URL

Send a target URL to the API with your crawl depth, page limit, and output format.

2. Spider crawls the site

Spider follows links, renders JavaScript, rotates proxies, and handles anti-bot protections automatically.

3. Get clean data back

Receive structured content as HTML, Markdown, plain text, JSON, screenshots, or PDF. Stream or batch.

Crawl modes

Pick the rendering strategy that fits your target. Switch per-request with a single parameter.

HTTP

Direct HTTP fetching without a browser. Ideal for static sites, blogs, and documentation.

~50ms per page

Smart

Auto-detects whether a page needs JavaScript rendering. Uses a browser only when necessary.

~200ms per page

Browser

Full headless Chrome with JavaScript execution. Handles SPAs, lazy-loaded content, and infinite scroll.

~1s per page

Start in minutes

A single API call. Pick your language.

curl -X POST https://api.spider.cloud/crawl \
  -H "Authorization: Bearer $SPIDER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 50,
    "return_format": "markdown"
  }'

Output formats

One crawl, multiple formats. Switch with a single parameter.

</>

HTML

Raw or cleaned HTML with optional tag filtering

# _

Markdown

Clean Markdown ideal for LLMs and RAG pipelines

Aa

Plain Text

Stripped text content with no markup

{}

JSON

Structured extraction via AI-powered schemas

Screenshot

Full-page PNG or viewport captures

PDF

Browser-rendered PDF exports of any page

Built for scale

Infrastructure that handles millions of pages.

Elastic concurrency

Auto-scales concurrent connections to match your crawl volume. 10 pages to 10 million, same code.

Proxy rotation

Automatic IP rotation across residential and datacenter proxies. Spider selects the optimal proxy type per domain.

Anti-bot bypass

Handles Cloudflare, Akamai, PerimeterX, and other detection systems. Fingerprints rotate per request.

HTTP caching

Previously crawled pages are cached and served instantly on repeat requests.

Webhooks

Get notified when crawls complete. Process pages as they're discovered instead of waiting for the full crawl.

FAQ

What is web crawling?

Web crawling is the automated process of discovering and fetching web pages by following links. A web crawler starts from one or more seed URLs, downloads the page content, extracts links, and repeats the process. Spider handles this entire workflow through a single API call. You provide a URL and Spider returns structured data from every reachable page.

Is web scraping legal?

Web scraping publicly available data is generally legal in the United States, supported by the hiQ Labs v. LinkedIn ruling. However, you should always respect robots.txt directives, terms of service, and avoid scraping personal data without consent. Spider provides built-in robots.txt compliance and rate limiting to help you crawl responsibly.

How many pages can Spider crawl?

Spider can crawl millions of pages per job with no hard limit. Our infrastructure auto-scales to handle crawls of any size, from a single page to an entire domain with hundreds of thousands of URLs. Crawl speed depends on your plan and concurrency settings, with enterprise users reaching 500+ pages per second.

What's the difference between crawling and scraping?

Web crawling is about discovery: following links to find pages across a site. Web scraping is about extraction: pulling specific data from those pages. Spider does both. It crawls your target site to discover all pages, then scrapes each page to return clean, structured data in your chosen format (HTML, Markdown, JSON, and more).

Does Spider handle JavaScript-rendered pages?

Yes. Spider offers three rendering modes: HTTP mode for fast static page fetching, Smart mode that auto-detects whether JavaScript rendering is needed, and Browser mode that uses a full headless browser to render JavaScript-heavy single-page applications. Browser mode supports the same interactions a real user would perform.

Start collecting web data

Free credits on signup, no card required. Crawl your first site in under a minute.