API Reference

The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standardized HTTP response codes, and authentication.

Set your API secret key in the authorization header to commence with the format Bearer $TOKEN. You can use the content-type header with application/json, application/xml, text/csv, and application/jsonl for shaping the response.

The Spider API supports bulk updates. You can work on multiple objects per request for the core API endpoints.

You can add v1 before any path to lock in that version. Executing a request on the page by pressing the Run button will consume live credits and treat the response as a genuine result.

Download OpenAPI Specification: Download

Just getting started?

Check out our development quickstart guide.

Not a developer?

Use Spiders no-code options or applications to get started with Spider and to do more with your Spider account no code required.

Base Url
https://api.spider.cloud

Crawl

Start crawling website(s) to collect resources. You can pass an array of objects for the request body.

POSThttps://api.spider.cloud/crawl

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    Internet icon

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages. Defaults to 0.

    Limit icon

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.

  • lite_mode boolean

    Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

Request
import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/crawl', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "content": "<resource>...",
    "error": null,
    "status": 200,
    "costs": {
      "ai_cost": 0,
      "compute_cost": 0.00001,
      "file_cost": 0.00002,
      "bytes_transferred_cost": 0.00002,
      "total_cost": 0.00004,
      "transform_cost": 0.0001
    },
    "url": "https://spider.cloud"
  },
  // more content...
]

Scrape

Start scraping a single page on website(s) to collect resources. You can pass an array of objects for the request body.

POSThttps://api.spider.cloud/scrape

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    Internet icon

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.

  • lite_mode boolean

    Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

Request
import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/json',
}

json_data = {"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/scrape', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "content": "<resource>...",
    "error": null,
    "status": 200,
    "costs": {
      "ai_cost": 0,
      "compute_cost": 0.00001,
      "file_cost": 0.00002,
      "bytes_transferred_cost": 0.00002,
      "total_cost": 0.00004,
      "transform_cost": 0.0001
    },
    "url": "https://spider.cloud"
  },
  // more content...
]

Perform a Google search to gather a list of websites for crawling and resource collection, including fallback options if the query yields no results. You can pass an array of objects for the request body.

POSThttps://api.spider.cloud/search

Request body

  • search required string

    The search query you want to search for.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages. Defaults to 0.

    Limit icon

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.

  • quick_search boolean

    Prioritize speed over output quantity.

Request
import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/json',
}

json_data = {"search":"sports news today","search_limit":3,"limit":5,"return_format":"markdown"}

response = requests.post('https://api.spider.cloud/search', 
  headers=headers, json=json_data)

print(response.json())
Response
{
  "content": [
      {
          "description": "Visit ESPN for live scores, highlights and sports news. Stream exclusive games on ESPN+ and play fantasy sports.",
          "title": "ESPN - Serving Sports Fans. Anytime. Anywhere.",
          "url": "https://www.espn.com/"
      },
      {
          "description": "Sports Illustrated, SI.com provides sports news, expert analysis, highlights, stats and scores for the NFL, NBA, MLB, NHL, college football, soccer,&nbsp;...",
          "title": "Sports Illustrated",
          "url": "https://www.si.com/"
      },
      {
          "description": "CBS Sports features live scoring, news, stats, and player info for NFL football, MLB baseball, NBA basketball, NHL hockey, college basketball and football.",
          "title": "CBS Sports - News, Live Scores, Schedules, Fantasy ...",
          "url": "https://www.cbssports.com/"
      },
      {
          "description": "Sport is a form of physical activity or game. Often competitive and organized, sports use, maintain, or improve physical ability and skills.",
          "title": "Sport",
          "url": "https://en.wikipedia.org/wiki/Sport"
      },
      {
          "description": "Watch FOX Sports and view live scores, odds, team news, player news, streams, videos, stats, standings &amp; schedules covering NFL, MLB, NASCAR, WWE, NBA, NHL,&nbsp;...",
          "title": "FOX Sports News, Scores, Schedules, Odds, Shows, Streams ...",
          "url": "https://www.foxsports.com/"
      },
      {
          "description": "Founded in 1974 by tennis legend, Billie Jean King, the Women's Sports Foundation is dedicated to creating leaders by providing girls access to sports.",
          "title": "Women's Sports Foundation: Home",
          "url": "https://www.womenssportsfoundation.org/"
      },
      {
          "description": "List of sports · Running. Marathon · Sprint · Mascot race · Airsoft · Laser tag · Paintball · Bobsleigh · Jack jumping · Luge · Shovel racing · Card stacking&nbsp;...",
          "title": "List of sports",
          "url": "https://en.wikipedia.org/wiki/List_of_sports"
      },
      {
          "description": "Stay up-to-date with the latest sports news and scores from NBC Sports.",
          "title": "NBC Sports - news, scores, stats, rumors, videos, and more",
          "url": "https://www.nbcsports.com/"
      },
      {
          "description": "r/sports: Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, and leagues around the world.",
          "title": "r/sports",
          "url": "https://www.reddit.com/r/sports/"
      },
      {
          "description": "The A-Z of sports covered by the BBC Sport team. Find all the latest live sports coverage, breaking news, results, scores, fixtures, tables,&nbsp;...",
          "title": "AZ Sport",
          "url": "https://www.bbc.com/sport/all-sports"
      }
  ]
}

Start crawling a website(s) to collect links found. You can pass an array of objects for the request body. This endpoint can save on latency if you only need to index the content URLs.

POSThttps://api.spider.cloud/links

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    Internet icon

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages. Defaults to 0.

    Limit icon

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.

  • lite_mode boolean

    Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

Request
import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/links', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "url": "https://spider.cloud",
    "status": 200,
    "error": null
  },
  // more content...
]

Screenshot

Take screenshots of a website to base64 or binary encoding. You can pass an array of objects for the request body.

POSThttps://api.spider.cloud/screenshot

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    Internet icon

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages. Defaults to 0.

    Limit icon

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.

  • lite_mode boolean

    Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

Request
import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/screenshot', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "content": "<resource>...",
    "error": null,
    "status": 200,
    "costs": {
      "ai_cost": 0,
      "compute_cost": 0.00001,
      "file_cost": 0.00002,
      "bytes_transferred_cost": 0.00002,
      "total_cost": 0.00004,
      "transform_cost": 0.0001
    },
    "url": "https://spider.cloud"
  },
  // more content...
]

Transform HTML

Transform HTML to Markdown or text fast. Each HTML transformation costs 0.1 credits. You can send up to 10MB of data at once. The transform API is also built into the /crawl endpoint by using return_format.

POSThttps://api.spider.cloud/transform

Request body

  • data required object

    A list of html data to transform. The object list takes the keys html and url. The url key is optional and only used when the readability is enabled.

Request
import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/json',
}

json_data = {"return_format":"markdown","data":[{"html":"<html><body>\n<h1>Example Website</h1>\n<p>This is some example markup to use to test the transform function.</p>\n<p><a href=\"https://spider.cloud/guides\">Guides</a></p>\n</body></html>","url":"https://example.com"}]}

response = requests.post('https://api.spider.cloud/transform', 
  headers=headers, json=json_data)

print(response.json())
Response
{
    "content": [
      "# Example Website
This is some example markup to use to test the transform function.
[Guides](https://spider.cloud/guides)"
    ],
    "cost": {
        "ai_cost": 0,
        "compute_cost": 0,
        "file_cost": 0,
        "bytes_transferred_cost": 0,
        "total_cost": 0,
        "transform_cost": 0.0001
    },
    "error": null,
    "status": 200
  }

Query

Query a resource from the global database instead of crawling a website. 0.1 credits per successful retrieval.

POSThttps://api.spider.cloud/data/query

Request body

  • url string

    The exact path of the url that you want to get.

  • domain string

    The website domain you want to query.

  • pathname string

    The website pathname you want to query.

Request
import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/json',
}

json_data = {"url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/data/query', 
  headers=headers, json=json_data)

print(response.json())
Response
{
  "content": "<html>
    <body>
      <div>
          <h1>Example Website</h1>
      </div>
    </body>
  </html>",
  "error": null,
  "status": 200
}

Proxy-Mode

Spider also offers a proxy front-end to the service. The Spider proxy will then handle requests just like any standard request, with the option to use high-performance and residential proxies up to 10GB per/s. Take a look at all of our proxy locations to see if we support the country.

**HTTP address**: proxy.spider.cloud:8888**HTTPS address**: proxy.spider.cloud:8889**Username**: YOUR-API-KEY**Password**: PARAMETERS

Residential

  • Speed: Up to 1GB/s
  • Purpose: Real-User IPs, Global Reach, High Anonymity
  • Cost: $1/GB - $4/GB

ISP

  • Speed: Up to 10GB/s
  • Purpose: Stable Datacenter IPs, Highest Performance
  • Cost: $1/GB

Mobile

  • Speed: Up to 100MB/s
  • Purpose: Real Mobile Devices, Avoid Detection
  • Cost: $2/GB

Use the country_code parameter to determine the proxy geolocation and the proxy parameter to change the proxy.

Proxy TypePriceMultiplierDescription
residential$1.00/GB×1Entry-level residential pool
residential_static$1.00/GB×1Static IPs for long-lived sessions
residential_fast$1.50/GB×1.5High-speed residential for heavy throughput
residential_core$1.50/GB×1.5Balanced quality and cost
residential_plus$3.00/GB×3.0Largest, highest-quality residential pool
residential_premium$4.00/GB×4.0Low-latency premium residential pool
mobile$2.00/GB×2.04G/5G mobile proxies for stealth
isp$1.00/GB×1ISP-grade residential routing
Example proxy request
import requests, os


# Proxy configuration
proxies = {
    'http': f"http://{os.getenv('SPIDER_API_KEY')}:proxy=residential@proxy.spider.cloud:8888",
    'https': f"https://{os.getenv('SPIDER_API_KEY')}:proxy=residential@proxy.spider.cloud:8889"
}

# Function to make a request through the proxy
def get_via_proxy(url):
    try:
        response = requests.get(url, proxies=proxies)
        response.raise_for_status()
        print('Response HTTP Status Code: ', response.status_code)
        print('Response HTTP Response Body: ', response.content)
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error: {e}")
        return None

# Example usage
if __name__ == "__main__":
     get_via_proxy("https://www.example.com")
     get_via_proxy("https://www.example.com/community")

Queries

Query the data that you collect during crawling and scraping. Add dynamic filters for extracting exactly what is needed.

Logs

Get the last 24 hours of logs.

GEThttps://api.spider.cloud/data/crawl_logs

Request params

  • url string

    Filter a single url record.

  • limit string

    The limit of records to get.

  • domain string

    Filter a single domain record.

  • page number

    The current page to get.

Request
import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/jsonl',
}

response = requests.get('https://api.spider.cloud/data/crawl_logs?limit=5&return_format=markdown&url=https%253A%252F%252Fspider.cloud', 
  headers=headers)

print(response.json())
Response
{
  "data": {
    "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh",
    "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
    "domain": "spider.cloud",
    "url": "https://spider.cloud",
    "links": 1,
    "credits_used": 3,
    "mode": 2,
    "crawl_duration": 340,
    "message": null,
    "request_user_agent": "Spider",
    "level": "UI",
    "status_code": 0,
    "created_at": "2024-04-21T01:21:32.886863+00:00",
    "updated_at": "2024-04-21T01:21:32.886863+00:00"
  },
  "error": null
}

Credits

Get the remaining credits available.

GEThttps://api.spider.cloud/data/credits
Request
import requests, os

headers = {
    'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
    'Content-Type': 'application/jsonl',
}

response = requests.get('https://api.spider.cloud/data/credits?limit=5&return_format=markdown&url=https%253A%252F%252Fspider.cloud', 
  headers=headers)

print(response.json())
Response
{
  "data": {
    "id": "8d662167-5a5f-41aa-9cb8-0cbb7d536891",
    "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
    "credits": 53334,
    "created_at": "2024-04-21T01:21:32.886863+00:00",
    "updated_at": "2024-04-21T01:21:32.886863+00:00"
  }
}