AI API Reference
Spider's AI endpoints enhance standard web data extraction with natural language understanding. Each AI endpoint accepts all parameters from its corresponding standard endpoint (e.g., /ai/crawl accepts all /crawl params), plus AI-specific parameters like prompt and extraction_schema.
https://api.spider.cloud/ai/crawlAI Crawl
Crawl websites intelligently using natural language prompts. Accepts all /crawl endpoint parameters plus AI-specific ones. The AI analyzes your prompt to determine crawl depth, page filtering, and content extraction strategies.
Parameters
| Name | Type | Status | Description |
|---|---|---|---|
| url | string | required | Starting URL to crawl |
| prompt | string | required | Natural language instruction for what to crawl and extract |
| limit | number | optional | Maximum pages to crawl |
| return_format | string | optional | Output format: markdown, html, text, or raw. Defaults to empty (only extracted data returned) |
| extraction_schema | object | optional | JSON Schema for structured extraction with name, description, and schema fields |
| metadata | boolean | optional | Include metadata with extracted_data in response |
| cleaning_intent | "extraction" | "action" | "general" | optional | Smart HTML cleaning: extraction (aggressive), action (preserves interactivity), general (balanced) |
Example Request
curl -X POST https://api.spider.cloud/ai/crawl \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"prompt": "Find all blog posts and extract titles and summaries",
"limit": 50
}'import requests
response = requests.post(
"https://api.spider.cloud/ai/crawl",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"url": "https://example.com",
"prompt": "Find all blog posts and extract titles and summaries",
"limit": 50
}
)
print(response.json())Example Response
[
{
"content": null,
"costs": {
"ai_cost": 0.002,
"total_cost": 0.003
},
"duration_elapsed_ms": 2150,
"error": null,
"metadata": {
"extracted_data": {
"title": "Getting Started with AI",
"summary": "An introduction to artificial intelligence..."
}
},
"status": 200,
"url": "https://example.com/blog/post-1"
}
]/ai/scrapeAI Scrape
Extract structured data from any webpage using plain English prompts. Accepts all /scrape endpoint parameters plus AI-specific ones. AI automatically identifies and extracts the data you describe. Use extraction_schema for typed JSON output.
Parameters
| Name | Type | Status | Description |
|---|---|---|---|
| url | string | required | URL to scrape |
| prompt | string | required | Natural language description of data to extract |
| return_format | string | optional | Output format: json, markdown, raw, html, text. Defaults to empty (only extracted data returned) |
| extraction_schema | object | optional | JSON Schema for structured extraction with name, description, and schema fields |
| metadata | boolean | optional | Include metadata with extracted_data in response |
| request | string | optional | Request type: http or chrome for JavaScript rendering |
| cleaning_intent | "extraction" | "action" | "general" | optional | Smart HTML cleaning: extraction (aggressive), action (preserves interactivity), general (balanced) |
Example Request
curl -X POST https://api.spider.cloud/ai/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"prompt": "Extract book details",
"extraction_schema": {
"name": "BookDetails",
"description": "Product information from a book listing",
"schema": "{\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\"},\"price\":{\"type\":\"string\"},\"availability\":{\"type\":\"string\"},\"description\":{\"type\":\"string\"}},\"required\":[\"title\",\"price\"]}"
},
"request": "chrome"
}'import requests
response = requests.post(
"https://api.spider.cloud/ai/scrape",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"prompt": "Extract book details",
"extraction_schema": {
"name": "BookDetails",
"description": "Product information from a book listing",
"schema": "{\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\"},\"price\":{\"type\":\"string\"},\"availability\":{\"type\":\"string\"},\"description\":{\"type\":\"string\"}},\"required\":[\"title\",\"price\"]}"
},
"request": "chrome"
}
)
print(response.json())Example Response
[
{
"content": null,
"costs": {
"ai_cost": 0,
"ai_cost_formatted": "0",
"bytes_transferred_cost": 0.000009658,
"compute_cost": 0.000006366,
"total_cost": 0.000017,
"total_cost_formatted": "0.000017"
},
"duration_elapsed_ms": 3824,
"error": null,
"metadata": {
"extracted_data": {
"title": "A Light in the Attic",
"price": "£51.77",
"availability": "In stock (22 available)",
"upc": "a897fe39b1053632",
"product_type": "Books"
}
},
"status": 200,
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
}
]/ai/searchAI Search
Search the web with AI-optimized queries. Accepts all /search endpoint parameters plus AI-specific ones. AI converts natural language to optimized search keywords and can automatically fetch and extract content from results.
Parameters
| Name | Type | Status | Description |
|---|---|---|---|
| prompt | string | required | Natural language search request - AI optimizes into search keywords |
| num | number | optional | Number of search results (AI determines optimal if not set) |
| fetch_page_content | boolean | optional | Fetch and extract content from search results (AI determines if needed) |
| return_format | string | optional | Output format when fetching content: markdown, html, text, raw |
| extraction_schema | object | optional | JSON Schema for structured extraction from fetched pages |
Example Request
curl -X POST https://api.spider.cloud/ai/search \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Find the best Python web scraping libraries with async support and good documentation"
}'import requests
response = requests.post(
"https://api.spider.cloud/ai/search",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"prompt": "Find the best Python web scraping libraries with async support and good documentation"
}
)
print(response.json())Example Response
[
{
"content": "# Async Web Scraping in Python...",
"costs": {
"ai_cost": 0.003,
"total_cost": 0.004
},
"duration_elapsed_ms": 4500,
"error": null,
"metadata": {
"extracted_data": {
"libraries": [
"httpx",
"aiohttp",
"playwright"
],
"features": [
"async support",
"good docs"
]
}
},
"status": 200,
"url": "https://example.com/python-scraping"
}
]/ai/browserAI Browser
Automate browser interactions using natural language. Accepts all browser automation parameters plus AI-specific ones. Describe actions in plain English and AI configures the automation.
Parameters
| Name | Type | Status | Description |
|---|---|---|---|
| url | string | required | URL to automate |
| prompt | string | required | Natural language description of browser actions |
| wait_for | number | optional | Wait time between actions in ms |
| screenshot | boolean | optional | Capture screenshot after actions |
Example Request
curl -X POST https://api.spider.cloud/ai/browser \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/login",
"prompt": "Click the sign in button, wait for the form, fill email with test@example.com",
"wait_for": 2000
}'import requests
response = requests.post(
"https://api.spider.cloud/ai/browser",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"url": "https://example.com/login",
"prompt": "Click the sign in button, wait for the form, fill email with test@example.com",
"wait_for": 2000
}
)
print(response.json())Example Response
[
{
"content": "<html>...</html>",
"costs": {
"ai_cost": 0.005,
"total_cost": 0.006
},
"duration_elapsed_ms": 5200,
"error": null,
"metadata": {
"extracted_data": {
"steps": [
"clicked: sign in button",
"waited: 1000ms",
"filled: email field"
]
},
"screenshot": "base64..."
},
"status": 200,
"url": "https://example.com/login"
}
]/ai/linksAI Links
Extract and filter links from webpages using AI guidance. Accepts all /links endpoint parameters plus AI-specific ones. Describe the types of links you want to find and the AI will intelligently filter and categorize them.
Parameters
| Name | Type | Status | Description |
|---|---|---|---|
| url | string | required | URL to extract links from |
| prompt | string | required | Natural language description of what links to find |
| limit | number | optional | Maximum links to return |
| depth | number | optional | Crawl depth for finding links |
Example Request
curl -X POST https://api.spider.cloud/ai/links \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"prompt": "Find all links to product pages and documentation",
"limit": 100
}'import requests
response = requests.post(
"https://api.spider.cloud/ai/links",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"prompt": "Find all links to product pages and documentation",
"limit": 100
}
)
print(response.json())Example Response
[
{
"content": null,
"costs": {
"ai_cost": 0.000193,
"total_cost": 0.000194
},
"duration_elapsed_ms": 299,
"error": null,
"metadata": {
"extracted_data": {
"links": [
{
"href": "https://books.toscrape.com/index.html",
"text": "Books to Scrape"
},
{
"href": "https://books.toscrape.com/category/books_1/index.html",
"text": "Books"
},
{
"href": "https://books.toscrape.com/category/books/poetry_23/index.html",
"text": "Poetry"
}
]
}
},
"status": 200,
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
}
]