Per-website scrapers that build themselves.
First call: AI discovers selectors, render mode, and a schema for the page. Subsequent calls hit cache. Each domain becomes its own JSON endpoint.
Three-layer cache hierarchy.
Configs get faster the more they're used. The first request bootstraps; everything after hits cache.
Memory cache
User-specific config first, then shared/public config. Sub-millisecond.
Database lookup
Queries config database for matching domain + path. Includes community-discovered configs.
AI discovery
AI crawls the page, analyzes structure, and generates an optimal scraper config. Only happens once per domain/path.
When to use which.
Use Fetch when
- You want structured data without writing CSS selectors
- You're scraping a site repeatedly and want cached configs
- You need AI to figure out the best extraction approach
- You want community-validated scraper configs
Use Scrape when
- You already know the exact CSS selectors you need
- You want full control over extraction settings
- You need raw markdown, HTML, or text output
- You're doing a one-off extraction
What the AI figures out.
CSS selectors
AI discovers the right selectors for titles, prices, descriptions, images, and other structured fields on each page type.
Request mode
Determines whether a page needs JavaScript rendering (chrome), stealth mode, or works with plain HTTP.
Scroll & wait
Detects lazy-loaded content that requires scrolling or waiting for specific elements to appear before extraction.
Extraction schema
Generates a JSON schema describing the structured data that can be extracted from the page.
Content filtering
Sets root selectors for main content and exclude selectors to skip ads, navigation, and footers.
Confidence scoring
Each config gets a confidence score. Configs are validated over time and improved as more users access the same endpoint.
Parameters at a glance.
/fetch/{domain}/{path}URL parameters
domain Target website domain (e.g. news.ycombinator.com)path Page path to scrape (e.g. /newest or /)Body parameters (all optional)
AI handles extraction automatically. These only control output format and crawl behavior.
return_format json (default), markdown, html, or textlimit Number of pages to crawl (default 1, max 100)readability Strip navigation, ads, sidebars. Returns main content only.cURL, Python, Node.
import os, requests
response = requests.post(
"https://api.spider.cloud/fetch/news.ycombinator.com/",
headers={
"Authorization": "Bearer " + os.environ["SPIDER_API_KEY"],
"Content-Type": "application/json",
},
json={"return_format": "json"},
)
data = response.json()
print(data)What you get back.
url The final URL after any redirectscontent Extracted data in your chosen formatstatus HTTP status code of the responsemetadata Page title, description, keywords, og:imagecss_extracted Structured data from AI-discovered selectors (JSON format)links All links found on the pageBrowse pre-configured scrapers.
Every time someone uses the Fetch API on a new domain, the AI-discovered config is validated and made available in the public directory. Browse available scrapers, see what fields they extract, and use them instantly.
More from the API.
Build scrapers without writing selectors.
Point the Fetch API at any website and get structured data back. AI handles the configuration.