Crawl API Reference

Point Spider at any URL and it recursively discovers every page on the domain. Results stream back as they are found, so you can process pages before the full crawl completes. Set depth limits, page caps, and output format to control exactly what you get back.

Key capabilities

Recursive link following with configurable depth
Streaming JSONL output for real-time processing
Markdown, HTML, plain text, or raw byte output
Page limit controls to stay within budget
Automatic duplicate URL detection
Batch multiple seed URLs in one request