Open Source vs Cloud
Same crawler. Less ops.
The core crawling engine is identical. The difference is everything around it: who manages the proxies, the browsers, the scaling, and the anti-bot logic.
# you handle everything
cargo add spider
# set up proxy rotation
# configure headless Chrome
# manage server scaling
# deal with bans + retries
# write markdown cleaning
# build extraction pipeline curl https://api.spider.cloud/crawl \
-H "Authorization: Bearer $KEY" \
-d '{
"url": "https://example.com",
"limit": 100
}'
✓ proxies, stealth, scaling handled What changes when you go managed
Self-host it
You want full control. You have your own proxies, your own servers, and a team that can maintain the pipeline. Or you need to run on-prem for compliance.
cargo add spider Let us run it
You want data, not infrastructure. Sites are blocking you and you need proxy intelligence. Or you just want to ship faster and skip the ops work.
curl https://api.spider.cloud/crawl Both paths use the same Rust crawling engine. The open source library is MIT-licensed and always will be. Spider Cloud is for when you want someone else to handle the hard parts.