AI API Reference

Spider's AI endpoints enhance standard web data extraction with natural language understanding. Each AI endpoint accepts all parameters from its corresponding standard endpoint (e.g., /ai/crawl accepts all /crawl params), plus AI-specific parameters like prompt and extraction_schema.

Base URLhttps://api.spider.cloud

POST/ai/crawl

AI Crawl

Crawl websites intelligently using natural language prompts. Accepts all /crawl endpoint parameters plus AI-specific ones. The AI analyzes your prompt to determine crawl depth, page filtering, and content extraction strategies.

Parameters

Name	Type	Status	Description
url	string	required	Starting URL to crawl
prompt	string	required	Natural language instruction for what to crawl and extract
limit	number	optional	Maximum pages to crawl
return_format	string	optional	Output format: markdown, html, text, or raw. Defaults to empty (only extracted data returned)
extraction_schema	object	optional	JSON Schema for structured extraction with name, description, and schema fields
metadata	boolean	optional	Include metadata with extracted_data in response
cleaning_intent	"extraction" \| "action" \| "general"	optional	Smart HTML cleaning: extraction (aggressive), action (preserves interactivity), general (balanced)

Example Request

cURL

curl -X POST https://api.spider.cloud/ai/crawl \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://example.com",
  "prompt": "Find all blog posts and extract titles and summaries",
  "limit": 50
}'

Python

import requests

response = requests.post(
    "https://api.spider.cloud/ai/crawl",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "url": "https://example.com",
        "prompt": "Find all blog posts and extract titles and summaries",
        "limit": 50
    }
)

print(response.json())

Example Response

[
  {
    "content": null,
    "costs": {
      "ai_cost": 0.002,
      "total_cost": 0.003
    },
    "duration_elapsed_ms": 2150,
    "error": null,
    "metadata": {
      "extracted_data": {
        "title": "Getting Started with AI",
        "summary": "An introduction to artificial intelligence..."
      }
    },
    "status": 200,
    "url": "https://example.com/blog/post-1"
  }
]

POST/ai/scrape

AI Scrape

Extract structured data from any webpage using plain English prompts. Accepts all /scrape endpoint parameters plus AI-specific ones. AI automatically identifies and extracts the data you describe. Use extraction_schema for typed JSON output.

Parameters

Name	Type	Status	Description
url	string	required	URL to scrape
prompt	string	required	Natural language description of data to extract
return_format	string	optional	Output format: json, markdown, raw, html, text. Defaults to empty (only extracted data returned)
extraction_schema	object	optional	JSON Schema for structured extraction with name, description, and schema fields
metadata	boolean	optional	Include metadata with extracted_data in response
request	string	optional	Request type: http or chrome for JavaScript rendering
cleaning_intent	"extraction" \| "action" \| "general"	optional	Smart HTML cleaning: extraction (aggressive), action (preserves interactivity), general (balanced)

Example Request

cURL

curl -X POST https://api.spider.cloud/ai/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
  "prompt": "Extract book details",
  "extraction_schema": {
    "name": "BookDetails",
    "description": "Product information from a book listing",
    "schema": "{\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\"},\"price\":{\"type\":\"string\"},\"availability\":{\"type\":\"string\"},\"description\":{\"type\":\"string\"}},\"required\":[\"title\",\"price\"]}"
  },
  "request": "chrome"
}'

Python

import requests

response = requests.post(
    "https://api.spider.cloud/ai/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
        "prompt": "Extract book details",
        "extraction_schema": {
            "name": "BookDetails",
            "description": "Product information from a book listing",
            "schema": "{\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\"},\"price\":{\"type\":\"string\"},\"availability\":{\"type\":\"string\"},\"description\":{\"type\":\"string\"}},\"required\":[\"title\",\"price\"]}"
        },
        "request": "chrome"
    }
)

print(response.json())

Example Response

[
  {
    "content": null,
    "costs": {
      "ai_cost": 0,
      "ai_cost_formatted": "0",
      "bytes_transferred_cost": 0.000009658,
      "compute_cost": 0.000006366,
      "total_cost": 0.000017,
      "total_cost_formatted": "0.000017"
    },
    "duration_elapsed_ms": 3824,
    "error": null,
    "metadata": {
      "extracted_data": {
        "title": "A Light in the Attic",
        "price": "£51.77",
        "availability": "In stock (22 available)",
        "upc": "a897fe39b1053632",
        "product_type": "Books"
      }
    },
    "status": 200,
    "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
  }
]

POST/ai/search

AI Search

Search the web with AI-optimized queries. Accepts all /search endpoint parameters plus AI-specific ones. AI converts natural language to optimized search keywords and can automatically fetch and extract content from results.

Parameters

Name	Type	Status	Description
prompt	string	required	Natural language search request - AI optimizes into search keywords
num	number	optional	Number of search results (AI determines optimal if not set)
fetch_page_content	boolean	optional	Fetch and extract content from search results (AI determines if needed)
return_format	string	optional	Output format when fetching content: markdown, html, text, raw
extraction_schema	object	optional	JSON Schema for structured extraction from fetched pages

Example Request

cURL

curl -X POST https://api.spider.cloud/ai/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "prompt": "Find the best Python web scraping libraries with async support and good documentation"
}'

Python

import requests

response = requests.post(
    "https://api.spider.cloud/ai/search",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "prompt": "Find the best Python web scraping libraries with async support and good documentation"
    }
)

print(response.json())

Example Response

[
  {
    "content": "# Async Web Scraping in Python...",
    "costs": {
      "ai_cost": 0.003,
      "total_cost": 0.004
    },
    "duration_elapsed_ms": 4500,
    "error": null,
    "metadata": {
      "extracted_data": {
        "libraries": [
          "httpx",
          "aiohttp",
          "playwright"
        ],
        "features": [
          "async support",
          "good docs"
        ]
      }
    },
    "status": 200,
    "url": "https://example.com/python-scraping"
  }
]

POST/ai/browser

AI Browser

Automate browser interactions using natural language. Accepts all browser automation parameters plus AI-specific ones. Describe actions in plain English and AI configures the automation.

Parameters

Name	Type	Status	Description
url	string	required	URL to automate
prompt	string	required	Natural language description of browser actions
wait_for	number	optional	Wait time between actions in ms
screenshot	boolean	optional	Capture screenshot after actions

Example Request

cURL

curl -X POST https://api.spider.cloud/ai/browser \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://example.com/login",
  "prompt": "Click the sign in button, wait for the form, fill email with test@example.com",
  "wait_for": 2000
}'

Python

import requests

response = requests.post(
    "https://api.spider.cloud/ai/browser",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "url": "https://example.com/login",
        "prompt": "Click the sign in button, wait for the form, fill email with test@example.com",
        "wait_for": 2000
    }
)

print(response.json())

Example Response

[
  {
    "content": "<html>...</html>",
    "costs": {
      "ai_cost": 0.005,
      "total_cost": 0.006
    },
    "duration_elapsed_ms": 5200,
    "error": null,
    "metadata": {
      "extracted_data": {
        "steps": [
          "clicked: sign in button",
          "waited: 1000ms",
          "filled: email field"
        ]
      },
      "screenshot": "base64..."
    },
    "status": 200,
    "url": "https://example.com/login"
  }
]

POST/ai/links

AI Links

Extract and filter links from webpages using AI guidance. Accepts all /links endpoint parameters plus AI-specific ones. Describe the types of links you want to find and the AI will intelligently filter and categorize them.

Parameters

Name	Type	Status	Description
url	string	required	URL to extract links from
prompt	string	required	Natural language description of what links to find
limit	number	optional	Maximum links to return
depth	number	optional	Crawl depth for finding links

Example Request

cURL

curl -X POST https://api.spider.cloud/ai/links \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
  "prompt": "Find all links to product pages and documentation",
  "limit": 100
}'

Python

import requests

response = requests.post(
    "https://api.spider.cloud/ai/links",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
        "prompt": "Find all links to product pages and documentation",
        "limit": 100
    }
)

print(response.json())

Example Response

[
  {
    "content": null,
    "costs": {
      "ai_cost": 0.000193,
      "total_cost": 0.000194
    },
    "duration_elapsed_ms": 299,
    "error": null,
    "metadata": {
      "extracted_data": {
        "links": [
          {
            "href": "https://books.toscrape.com/index.html",
            "text": "Books to Scrape"
          },
          {
            "href": "https://books.toscrape.com/category/books_1/index.html",
            "text": "Books"
          },
          {
            "href": "https://books.toscrape.com/category/books/poetry_23/index.html",
            "text": "Poetry"
          }
        ]
      }
    },
    "status": 200,
    "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
  }
]