API Reference

The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standard HTTP response codes, authentication, and verbs. Set your API secret key in the authorization header to commence. You can use the content-type header with application/json, application/xml, text/csv, and application/jsonl for shaping the response.

The Spider API supports multi domain actions. You can work with multiple domains per request by adding the urls comma separated.

The Spider API differs for every account as we release new versions and tailor functionality. You can add v1 before any path to pin to the version.

Just getting started?

Check out our development quickstart guide.

Not a developer?

Use Spiders no-code options or apps to get started with Spider and to do more with your Spider account no code required.

Base Url
https://api.spider.cloud
Client libraries

Crawl websites

Start crawling a website(s) to collect resources.

POST https://api.spider.cloud/crawl

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/crawl', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": "<html>...",
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Crawl websites get links

Start crawling a website(s) to collect links found.

POST https://api.spider.cloud/links

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/links', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": "",
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Screenshot websites

Start taking screenshots of website(s) to collect images to base64 or binary.

POST https://api.spider.cloud/screenshot

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/screenshot', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": "base64...",
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Pipelines

Create powerful workflows with our pipeline API endpoints. Use AI to extract contacts from any website or filter links with prompts with ease.

Crawl websites and extract contacts

Start crawling a website(s) to collect all contacts utilizing AI. A minimum of $25 in credits is necessary for extraction.

POST https://api.spider.cloud/pipeline/extract-contacts

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/pipeline/extract-contacts', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": [{
      "full_name": "John Doe",
      "email": "johndoe@gmail.com",
      "phone": "555-555-555",
      "title": "Baker"
    }],
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Label website

Crawl a website and accurately categorize it using AI.

POST https://api.spider.cloud/pipeline/label

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/pipeline/label', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": ["Government"],
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Queries

Query the data that you collect. Add dynamic filters for extracting exactly what is needed.

Websites Collection

Get the websites stored.

GET https://api.spider.cloud/data/websites

Request params

  • limit string

    The limit of records to get.

  • page number

    The current page to get.

  • domain string

    Filter a single domain record.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/websites', 
  headers=headers)

print(response.json())
Response
{
 "data": [
  {
    "id": "2a503c02-f161-444b-b1fa-03a3914667b6",
    "user_id": "6bd06efa-bb0b-4f1f-a29f-05db0c4b1bfd",
    "url": "6bd06efa-bb0b-4f1f-a29f-05db0c4b1bfd/example.com/index.html",
    "domain": "example.com",
    "created_at": "2024-04-18T15:40:25.667063+00:00",
    "updated_at": "2024-04-18T15:40:25.667063+00:00",
    "pathname": "/",
    "fts": "",
    "scheme": "https:",
    "last_checked_at": "2024-05-10T13:39:32.293017+00:00",
    "screenshot": null
  }
 ] 
} 

Pages Collection

Get the pages/resources stored.

GET https://api.spider.cloud/data/pages

Request params

  • limit string

    The limit of records to get.

  • page number

    The current page to get.

  • domain string

    Filter a single domain record.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/pages', 
  headers=headers)

print(response.json())
Response
{
  "data": [
    {
      "id": "733b0d0f-e406-4229-949d-8068ade54752",
      "user_id": "6bd06efa-bb0b-4f1f-a29f-05db0c4b1bfd",
      "url": "https://www.example.com",
      "domain": "www.example.com",
      "created_at": "2024-04-17T01:28:15.016975+00:00",
      "updated_at": "2024-04-17T01:28:15.016975+00:00",
      "proxy": true,
      "headless": true,
      "crawl_budget": null,
      "scheme": "https:",
      "last_checked_at": "2024-04-17T01:28:15.016975+00:00",
      "full_resources": false,
      "metadata": true,
      "gpt_config": null,
      "smart_mode": false,
      "fts": "'www.example.com':1"
    }
  ]
}

Pages Metadata Collection

Get the pages metadata/resources stored.

GET https://api.spider.cloud/data/pages_metadata

Request params

  • limit string

    The limit of records to get.

  • page number

    The current page to get.

  • domain string

    Filter a single domain record.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/pages_metadata', 
  headers=headers)

print(response.json())
Response
{
 "data": [
  {
    "id": "e27a1995-2abe-4319-acd1-3dd8258f0f49",
    "user_id": "253524cd-3f94-4ed1-83b3-f7fab134c3ff",
    "url": "253524cd-3f94-4ed1-83b3-f7fab134c3ff/www.google.com/search?query=spider.cloud.html",
    "domain": "www.google.com",
    "resource_type": "html",
    "title": "spider.cloud - Google Search",
    "description": "",
    "file_size": 1253960,
    "embedding": null,
    "pathname": "/search",
    "created_at": "2024-05-18T17:40:16.4808+00:00",
    "updated_at": "2024-05-18T17:40:16.4808+00:00",
    "keywords": [
      "Fastest Web Crawler spider",
      "Web scraping",
    ],
    "labels": "Search Engine",
    "extracted_data": null,
    "fts": "'/search':1"
  }
 ] 
}

Contacts Collection

Get the pages contacts stored.

GET https://api.spider.cloud/data/contacts

Request params

  • limit string

    The limit of records to get.

  • page number

    The current page to get.

  • domain string

    Filter a single domain record.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/contacts', 
  headers=headers)

print(response.json())
Response

Crawl State

Get the state of the crawl for the domain.

GET https://api.spider.cloud/data/crawl_state

Request params

  • limit string

    The limit of records to get.

  • page number

    The current page to get.

  • domain string

    Filter a single domain record.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/crawl_state', 
  headers=headers)

print(response.json())
Response
{
    "data": {
      "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh",
      "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
      "domain": "example.com",
      "url": "https://example.com/",
      "links": 1,
      "credits_used": 3,
      "mode": 2,
      "crawl_duration": 340,
      "message": null,
      "request_user_agent": "Spider",
      "level": "info",
      "status_code": 0,
      "created_at": "2024-04-21T01:21:32.886863+00:00",
      "updated_at": "2024-04-21T01:21:32.886863+00:00"
    },
    "error": ""
  }

Crawl Logs

Get the last 24 hours of logs.

GET https://api.spider.cloud/data/crawl_logs

Request params

  • limit string

    The limit of records to get.

  • page number

    The current page to get.

  • domain string

    Filter a single domain record.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/crawl_logs', 
  headers=headers)

print(response.json())
Response
{
    "data": {
      "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh",
      "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
      "domain": "example.com",
      "url": "https://example.com/",
      "links": 1,
      "credits_used": 3,
      "mode": 2,
      "crawl_duration": 340,
      "message": null,
      "request_user_agent": "Spider",
      "level": "info",
      "status_code": 0,
      "created_at": "2024-04-21T01:21:32.886863+00:00",
      "updated_at": "2024-04-21T01:21:32.886863+00:00"
    },
    "error": ""
  }

Credits

Get the remaining credits available.

GET https://api.spider.cloud/data/credits

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/credits', 
  headers=headers)

print(response.json())
Response
{
    "data": {
      "id": "8d662167-5a5f-41aa-9cb8-0cbb7d536891",
      "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
      "credits": 53334,
      "created_at": "2024-04-21T01:21:32.886863+00:00",
      "updated_at": "2024-04-21T01:21:32.886863+00:00"
    }
  }

Crons

Get the cron jobs that are set to keep data fresh.

GET https://api.spider.cloud/data/crons

Request params

  • limit string

    The limit of records to get.

  • page number

    The current page to get.

  • domain string

    Filter a single domain record.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/crons', 
  headers=headers)

print(response.json())
Response

User Profile

Get the profile of the user. This returns data such as approved limits and usage for the month.

GET https://api.spider.cloud/data/profiles

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/crawl', 
  headers=headers)

print(response.json())
Response
{
 "data": [
  {
    "id": "6bd06efa-bb0b-4f1f-a29f-05db0c4b1bfd",
    "email": "user@gmail.com",
    "stripe_id": "cus_OYO2rAhSQaYqHT",
    "is_deleted": null,
    "proxy": null,
    "headless": false,
    "billing_limit": 50,
    "billing_limit_soft": 120,
    "approved_usage": 0,
    "crawl_budget": {
        "*": 200
    },
    "usage": null,
    "has_subscription": false,
    "depth": null,
    "full_resources": false,
    "meta_data": true,
    "billing_allowed": false,
    "initial_promo": false
  }
 ] 
}

User-Agent

Get a real user agent to use for crawling.

GET https://api.spider.cloud/data/user_agents

Request params

  • limit string

    The limit of records to get.

  • os string

    Filter a by a device ex: Android, Mac OS, Android, Windows, Linux and more.

  • page number

    The current page to get.

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/user_agents', 
  headers=headers)

print(response.json())
Response
{
    "data": {
      "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
      "platform": "Chrome",
      "platform_version": "123.0.0.0",
      "device": "Macintosh",
      "os": "Mac OS",
      "os_version": "10.15.7",
      "cpu_architecture": "",
      "mobile": false,
      "device_type": "desktop"
    }
  }