Same crawler. Less ops.
The core crawling engine is identical. What changes is everything around it — who manages the proxies, the browsers, the scaling, and the anti-bot logic.
# you handle everything cargo add spider # set up proxy rotation # configure headless Chrome # manage server scaling # handle bans + retries # write markdown cleaning # build extraction pipeline
curl https://api.spider.cloud/crawl \ -H "Authorization: Bearer $KEY" \ -d '{ "url": "https://example.com", "limit": 100 }' → proxies, stealth, scaling handled
What changes when you go managed.
You size and manage your own fleet.
Cloud → Elastic. 10 pages or 10 million, same API call.
Bring your own. Rotate them yourself.
Cloud → Rotated per-request with real-time block detection.
Basic fingerprint randomization via libraries.
Cloud → Full stealth engine. Fingerprinting baked into every request.
You run headless Chrome or Firefox.
Cloud → Custom Rust browser. Faster, lighter, built for scraping.
html2md library output.
Cloud → Per-site tuning that strips noise for cleaner LLM inputs.
Not included.
Cloud → Send a schema, get structured JSON. No parsers to maintain.
You want full control.
You have your own proxies, your own servers, and a team that can maintain the pipeline. Or you need to run on-prem for compliance reasons.
cargo add spiderYou want data, not infrastructure.
Sites are blocking you and you need proxy intelligence. Or you just want to ship faster and skip the ops work entirely.
curl https://api.spider.cloud/crawlBoth paths use the same Rust crawling engine. The open-source library is MIT-licensed and always will be. Spider Cloud is for when you want someone else to handle the hard parts.
Spider Cloud, free balance on signup.
No credit card required. Top up later when you're ready to scale.