API Reference
The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standard HTTP response codes, authentication, and verbs. Set your API secret key in the authorization
header to commence with the format Bearer $TOKEN
. You can use the content-type
header with application/json
, application/xml
, text/csv
, and application/jsonl
for shaping the response.
The Spider API supports multi domain actions. You can work with multiple domains per request by adding the urls comma separated.
The Spider API differs for every account as we release new versions and tailor functionality. You can add v1
before any path to pin to the version. Executing a request on the page by pressing the Run button will consume live credits and consider the response as a genuine result.
Just getting started?
Check out our development quickstart guide.
Not a developer?
Use Spiders no-code options or apps to get started with Spider and to do more with your Spider account no code required.
https://api.spider.cloud
Client libraries
Crawl websites
Start crawling a website or websites to collect resources. You can pass an array of objects for the request body.
POSThttps://api.spider.cloud/crawl
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0
to crawl all pages. Defaults to0
.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
store_data boolean
Decide whether to store data. If enabled, this option overrides
storageless
. The default setting isfalse
. Storage costs are 0.30 cents per gigabyte per month.Set to
true
to collect resources to download and re-use later on. *Please note: Storing data incurs fee of $0.30/gigabyte/day.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/crawl',
headers=headers, json=json_data)
print(response.json())
[ { "content": "<resource>...", "error": null, "status": 200, "costs": { "ai_cost": 0, "compute_cost": 0.0001, "file_cost": 0.0002, "total_cost": 0.0004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Search
Perform a search and gather a list of websites to start crawling and collect resources.
POSThttps://api.spider.cloud/search
Request body
search required string
The search query you want to search for.
limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0
to crawl all pages. Defaults to0
.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
store_data boolean
Decide whether to store data. If enabled, this option overrides
storageless
. The default setting isfalse
. Storage costs are 0.30 cents per gigabyte per month.Set to
true
to collect resources to download and re-use later on. *Please note: Storing data incurs fee of $0.30/gigabyte/day.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {"search":"a sports website","search_limit":3,"limit":25,"return_format":"markdown"}
response = requests.post('https://api.spider.cloud/search',
headers=headers, json=json_data)
print(response.json())
[ { "content": "<resource>...", "error": null, "status": 200, "costs": { "ai_cost": 0, "compute_cost": 0.0001, "file_cost": 0.0002, "total_cost": 0.0004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Crawl websites get links
Start crawling a website(s) to collect links found. You can pass an array of objects for the request body.
POSThttps://api.spider.cloud/links
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0
to crawl all pages. Defaults to0
.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
store_data boolean
Decide whether to store data. If enabled, this option overrides
storageless
. The default setting isfalse
. Storage costs are 0.30 cents per gigabyte per month.Set to
true
to collect resources to download and re-use later on. *Please note: Storing data incurs fee of $0.30/gigabyte/day.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/links',
headers=headers, json=json_data)
print(response.json())
[ { "url": "https://spider.cloud", "status": 200, "error": null }, // more content... ]
Screenshot websites
Take screenshots to base64 or binary encoding.
POSThttps://api.spider.cloud/screenshot
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0
to crawl all pages. Defaults to0
.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
store_data boolean
Decide whether to store data. If enabled, this option overrides
storageless
. The default setting isfalse
. Storage costs are 0.30 cents per gigabyte per month.Set to
true
to collect resources to download and re-use later on. *Please note: Storing data incurs fee of $0.30/gigabyte/day.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {"limit":25,"url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/screenshot',
headers=headers, json=json_data)
print(response.json())
[ { "content": "<resource>...", "error": null, "status": 200, "costs": { "ai_cost": 0, "compute_cost": 0.0001, "file_cost": 0.0002, "total_cost": 0.0004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Transform HTML
Transform HTML to Markdown or text fast. Each HTML transformation costs 1 credit. You can send up to 10MB of data at once. The transform API is also built into the /crawl
endpoint by using return_format
.
POSThttps://api.spider.cloud/transform
Request body
data required object
A list of html data to transform. The object list takes the keys
html
andurl
. The url key is optional and only used when the readability is enabled.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {"return_format":"markdown","data":[{"html":"<html>\n<head>\n <title>Example Transform</title> \n</head>\n<body>\n<div>\n <h1>Example Website</h1>\n <p>This is some example markup to use to test the transform function.</p>\n <p><a href=\"https://spider.cloud/guides\">Guides</a></p>\n</div>\n</body></html>","url":"https://example.com"}]}
response = requests.post('https://api.spider.cloud/transform',
headers=headers, json=json_data)
print(response.json())
{ "content": [ "Example Domain Example Domain ========== This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission. [More information...](https://www.iana.org/domains/example)" ], "cost": { "ai_cost": 0, "compute_cost": 0, "file_cost": 0, "total_cost": 0, "transform_cost": 0.0001 }, "error": null, "status": 200 }
Query
Query a resource from the global database instead of crawling a website. 1 credit per successful retrieval.
POSThttps://api.spider.cloud/data/query
Request body
url string
The exact path of the url that you want to get.
domain string
The website domain you want to query.
pathname string
The website pathname you want to query.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {"url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/data/query',
headers=headers, json=json_data)
print(response.json())
{ "content": "<html> <body> <div> <h1>Example Website</h1> </div> </body> </html>", "error": null, "status": 200 }
Proxy-Mode Alpha
Spider also offers a proxy front-end to the service. The Spider proxy will then handle requests just like any standard request, with the option to use high-performance residential proxies at 1TB per/s.
**HTTP address**: proxy.spider.cloud:8888
**HTTPS address**: proxy.spider.cloud:8889
**Username**: YOUR-API-KEY
**Password**: PARAMETERS
import requests, os
# Proxy configuration
proxies = {
'http': f"http://{os.getenv('SPIDER_API_KEY')}:request=Raw&premium_proxy=False@proxy.spider.cloud:8888",
'https': f"https://{os.getenv('SPIDER_API_KEY')}:request=Raw&premium_proxy=False@proxy.spider.cloud:8889"
}
# Function to make a request through the proxy
def get_via_proxy(url):
try:
response = requests.get(url, proxies=proxies)
response.raise_for_status()
print('Response HTTP Status Code: ', response.status_code)
print('Response HTTP Response Body: ', response.content)
return response.text
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
return None
# Example usage
if __name__ == "__main__":
get_via_proxy("https://www.choosealicense.com")
get_via_proxy("https://www.choosealicense.com/community")
Pipelines
Create powerful workflows with our pipeline API endpoints. Use AI to extract leads from any website or filter links with prompts with ease.
Extract leads
Start crawling a website(s) to collect all leads utilizing AI. A minimum of $25 in credits is necessary for extraction.
POSThttps://api.spider.cloud/pipeline/extract-contacts
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0
to crawl all pages. Defaults to0
.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
store_data boolean
Decide whether to store data. If enabled, this option overrides
storageless
. The default setting isfalse
. Storage costs are 0.30 cents per gigabyte per month.Set to
true
to collect resources to download and re-use later on. *Please note: Storing data incurs fee of $0.30/gigabyte/day.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/pipeline/extract-contacts',
headers=headers, json=json_data)
print(response.json())
[ { "content": "<resource>...", "error": null, "status": 200, "costs": { "ai_cost": 0, "compute_cost": 0.0001, "file_cost": 0.0002, "total_cost": 0.0004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Label website
Crawl a website and accurately categorize it using AI.
POSThttps://api.spider.cloud/pipeline/label
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0
to crawl all pages. Defaults to0
.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
store_data boolean
Decide whether to store data. If enabled, this option overrides
storageless
. The default setting isfalse
. Storage costs are 0.30 cents per gigabyte per month.Set to
true
to collect resources to download and re-use later on. *Please note: Storing data incurs fee of $0.30/gigabyte/day.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/pipeline/label',
headers=headers, json=json_data)
print(response.json())
[ { "content": "<resource>...", "error": null, "status": 200, "costs": { "ai_cost": 0, "compute_cost": 0.0001, "file_cost": 0.0002, "total_cost": 0.0004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Crawl websites from text
Crawl website(s) found from raw text or markdown.
POSThttps://api.spider.cloud/pipeline/crawl-text
Request body
text required string
The text string to extract urls from. The max limit for the text is 10mb.
limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0
to crawl all pages. Defaults to0
.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
store_data boolean
Decide whether to store data. If enabled, this option overrides
storageless
. The default setting isfalse
. Storage costs are 0.30 cents per gigabyte per month.Set to
true
to collect resources to download and re-use later on. *Please note: Storing data incurs fee of $0.30/gigabyte/day.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"text":"Check this link: https://example.com and email to example@email.com","limit":25,"return_format":"markdown"}
response = requests.post('https://api.spider.cloud/pipeline/crawl-text',
headers=headers, json=json_data)
print(response.json())
[ { "content": "<resource>...", "error": null, "status": 200, "costs": { "ai_cost": 0, "compute_cost": 0.0001, "file_cost": 0.0002, "total_cost": 0.0004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Filter links
Filter links using AI and advanced metadata.
POSThttps://api.spider.cloud/pipeline/filter-links
Request body
url required array
The urls to filter.
You can pass up to 4k tokens for the links and prompt.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/pipeline/filter-links',
headers=headers, json=json_data)
print(response.json())
{ "content": [ { "relevant_urls": [ "https://spider.cloud", "https://foodnetwork.com" ] } ], "cost": { "ai_cost": 0.0005, "compute_cost": 0, "file_cost": 0, "total_cost": 0, "transform_cost": 0 }, "error": "", "status": 200 }
Questions and Answers
Get a question-and-answer list for a website based on any inquiry.
POSThttps://api.spider.cloud/pipeline/extract-qa
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0
to crawl all pages. Defaults to0
.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
store_data boolean
Decide whether to store data. If enabled, this option overrides
storageless
. The default setting isfalse
. Storage costs are 0.30 cents per gigabyte per month.Set to
true
to collect resources to download and re-use later on. *Please note: Storing data incurs fee of $0.30/gigabyte/day.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/pipeline/extract-qa',
headers=headers, json=json_data)
print(response.json())
{ "content": [ { "answer": "Spider is a data collecting solution designed for web crawling and scraping.", "question": "What is the primary function of Spider?" }, { "answer": "You can kickstart your data collecting projects by signing up for a free trial or taking advantage of the promotional credits offered.", "question": "How can I get started with Spider?" }, { "answer": "Spider offers unmatched speed, scalability, and comprehensive data curation, making it trusted by leading tech businesses.", "question": "What are the benefits of using Spider for data collection?" }, { "answer": "Spider can easily crawl, search, and extract data from various sources, including social media platforms.", "question": "What kind of data can Spider extract?" }, { "answer": "Spider is built fully in Rust for next-generation scalability.", "question": "What programming language is Spider built with?" } ], "cost": { "ai_cost": 0.0009, "compute_cost": 0.0001, "file_cost": 0, "total_cost": 0, "transform_cost": 0 }, "error": null, "status": 200, "url": "https://spider.cloud" }
Queries
Query the data that you collect during crawling and scraping. Add dynamic filters for extracting exactly what is needed.
Websites Collection
Get the websites stored.
GEThttps://api.spider.cloud/data/websites
Request body
limit string
The limit of records to get.
page number
The current page to get.
domain string
Filter a single domain record.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/websites?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
{ "data": [ { "id": "2a503c02-f161-444b-b1fa-03a3914667b6", "user_id": "6bd06efa-bb0b-4f1f-a23f-05db0c4b1bfd", "url": "6bd06efa-bb0b-4f1f-a29f-05db0c4b1bfd/example.com/index.html", "domain": "spider.cloud", "created_at": "2024-04-18T15:40:25.667063+00:00", "updated_at": "2024-04-18T15:40:25.667063+00:00", "pathname": "/", "fts": "", "scheme": "https:", "last_checked_at": "2024-05-10T13:39:32.293017+00:00", "screenshot": null } ] }
Pages Collection
Get the pages/resources stored.
GEThttps://api.spider.cloud/data/pages
Request body
limit string
The limit of records to get.
page number
The current page to get.
domain string
Filter a single domain record.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/pages?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
{ "data": [ { "id": "733b0d0f-e406-4229-949d-8068ade54752", "user_id": "6bd06efa-bb0b-4f1f-a29f-05db0c4b1bfd", "url": "https://spider.cloud", "domain": "spider.cloud", "created_at": "2024-04-17T01:28:15.016975+00:00", "updated_at": "2024-04-17T01:28:15.016975+00:00", "proxy": true, "headless": true, "crawl_budget": null, "scheme": "https:", "last_checked_at": "2024-04-17T01:28:15.016975+00:00", "full_resources": false, "metadata": true, "gpt_config": null, "smart_mode": false, "fts": "'spider.cloud':1" } ] }
Pages Metadata Collection
Get the pages metadata/resources stored.
GEThttps://api.spider.cloud/data/pages_metadata
Request body
limit string
The limit of records to get.
page number
The current page to get.
domain string
Filter a single domain record.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/pages_metadata?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
{ "data": [ { "id": "e27a1995-2abe-4319-acd1-3dd8258f0f49", "user_id": "253524cd-3f94-4ed1-83b3-f7fab134c3ff", "url": "253524cd-3f94-4ed1-83b3-f7fab134c3ff/www.google.com/search?query=spider.cloud.html", "domain": "www.google.com", "resource_type": "html", "title": "spider.cloud - Google Search", "description": "", "file_size": 1253960, "embedding": null, "pathname": "/search", "created_at": "2024-05-18T17:40:16.4808+00:00", "updated_at": "2024-05-18T17:40:16.4808+00:00", "keywords": [ "Fastest Web Crawler spider", "Web scraping" ], "labels": "Search Engine", "extracted_data": null, "fts": "'/search':1" } ] }
Leads Collection
Get the pages contacts stored.
GEThttps://api.spider.cloud/data/contacts
Request body
limit string
The limit of records to get.
page number
The current page to get.
domain string
Filter a single domain record.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/contacts?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
{ "data": [ { "full_name": "John Doe", "email": "johndoe@gmail.com", "phone": "555-555-555", "title": "Baker" } ] }
Crawl State
Get the state of the crawl for the domain.
GEThttps://api.spider.cloud/data/crawl_state
Request body
limit string
The limit of records to get.
page number
The current page to get.
domain string
Filter a single domain record.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/crawl_state?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
{ "data": { "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg", "domain": "spider.cloud", "url": "https://spider.cloud/", "links": 1, "credits_used": 3, "mode": 2, "crawl_duration": 340, "message": null, "request_user_agent": "Spider", "level": "info", "status_code": 0, "created_at": "2024-04-21T01:21:32.886863+00:00", "updated_at": "2024-04-21T01:21:32.886863+00:00" }, "error": null }
Crawl Logs
Get the last 24 hours of logs.
GEThttps://api.spider.cloud/data/crawl_logs
Request body
limit string
The limit of records to get.
page number
The current page to get.
domain string
Filter a single domain record.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/crawl_logs?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
{ "data": { "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg", "domain": "spider.cloud", "url": "https://spider.cloud/", "links": 1, "credits_used": 3, "mode": 2, "crawl_duration": 340, "message": null, "request_user_agent": "Spider", "level": "info", "status_code": 0, "created_at": "2024-04-21T01:21:32.886863+00:00", "updated_at": "2024-04-21T01:21:32.886863+00:00" }, "error": null }
Credits
Get the remaining credits available.
GEThttps://api.spider.cloud/data/credits
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/credits?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
{ "data": { "id": "8d662167-5a5f-41aa-9cb8-0cbb7d536891", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg", "credits": 53334, "created_at": "2024-04-21T01:21:32.886863+00:00", "updated_at": "2024-04-21T01:21:32.886863+00:00" } }
Crons
Get the cron jobs that are set to keep data fresh.
GEThttps://api.spider.cloud/data/crons
Request body
limit string
The limit of records to get.
page number
The current page to get.
domain string
Filter a single domain record.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/crons?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
User Profile
Get the profile of the user. This returns data such as approved limits and usage for the month.
GEThttps://api.spider.cloud/data/profile
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/crawl?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
{ "data": { "id": "6bd06efa-bb0b-4f1f-a29f-05db0c4b1bfd", "email": "user@gmail.com", "stripe_id": "cus_OYO2rAhSQaYqHT", "is_deleted": null, "proxy": null, "headless": false, "billing_limit": 50, "billing_limit_soft": 120, "approved_usage": 0, "crawl_budget": { "*": 200 }, "usage": null, "has_subscription": false, "depth": null, "full_resources": false, "meta_data": true, "billing_allowed": false, "initial_promo": false } }
User-Agents
Get a real user agent to use for crawling.
GEThttps://api.spider.cloud/data/user_agents
Request body
limit string
The limit of records to get.
os string
Filter a by a device ex:
Android
,Mac OS
,Android
,Windows
,Linux
and more.page number
The current page to get.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/user_agents?limit%3D25%26return_format%3Dmarkdown',
headers=headers)
print(response.json())
{ "data": { "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36", "platform": "Chrome", "platform_version": "123.0.0.0", "device": "Macintosh", "os": "Mac OS", "os_version": "10.15.7", "cpu_architecture": "", "mobile": false, "device_type": "desktop" } }
Download file
Download a resource from storage.
GEThttps://api.spider.cloud/data/download
Request body
url string
The exact path of the url that you want to get.
domain string
The website domain you want to query.
pathname string
The website pathname you want to query.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.get('https://api.spider.cloud/data/download?',
headers=headers)
print(response.json())
{ "data": "<file>" }
Manage
Configure data to enhance crawl efficiency: create, update, and delete records.
Website
Create or update a website by configuration.
POSThttps://api.spider.cloud/data/websites
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0
to crawl all pages. Defaults to0
.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
store_data boolean
Decide whether to store data. If enabled, this option overrides
storageless
. The default setting isfalse
. Storage costs are 0.30 cents per gigabyte per month.Set to
true
to collect resources to download and re-use later on. *Please note: Storing data incurs fee of $0.30/gigabyte/day.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/crawl',
headers=headers, json=json_data)
print(response.json())
{ "data": null }
Website
Delete a website from your collection. Remove the url
body to delete all websites.
DELETEhttps://api.spider.cloud/data/websites
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/data/websites',
headers=headers, json=json_data)
print(response.json())
{ "data": null }
Pages
Delete a web page from your collection. Remove the url
body to delete all pages.
DELETEhttps://api.spider.cloud/data/pages
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/data/pages',
headers=headers, json=json_data)
print(response.json())
{ "data": null }
Pages Metadata
Delete a web page metadata from your collection. Remove the url
body to delete all pages metadata.
DELETEhttps://api.spider.cloud/data/pages_metadata
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/data/pages_metadata',
headers=headers, json=json_data)
print(response.json())
{ "data": null }
Leads
Delete a contact or lead from your collection. Remove the url
body to delete all contacts.
DELETEhttps://api.spider.cloud/data/contacts
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
json_data = {"limit":25,"return_format":"markdown","url":"https://spider.cloud"}
response = requests.post('https://api.spider.cloud/data/contacts',
headers=headers, json=json_data)
print(response.json())
{ "data": null }
Comprehensive Data APIs
Access JSON-ready data for various financial metrics, including currency exchange rates, commodity prices, and more.
Exchange Rates
Get access to real-time exchange rates for up to 150 global currency pairs with our comprehensive API, available for just 1 credit per call. With rates refreshed daily, you can ensure that your financial transactions are based on the most current data. This reliable endpoint seamlessly integrates with Stripe checkout, making it ideal for universal payments and enabling precise currency conversions to optimize your financial operations. Maximize your transactional efficiency and accuracy with our currency rate service.
GEThttps://api.spider.cloud/data/exchange-rates
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.post('https://api.spider.cloud/data/exchange-rates',
headers=headers)
print(response.json())
{ "content": { "usd": 1, "eur": 0.9266, "aud": 1.5374, }, "error": null }
Commodity Rates
Unlock the power of market intelligence with our service! For just 1 credit per call, you can access comprehensive commodity prices updated daily. Whether you’re involved in agriculture, energy, metals, or other sectors, our data-driven insights ensure you’re always equipped with the latest, most accurate information to make informed decisions and stay ahead in the fiercely competitive market.
GEThttps://api.spider.cloud/data/commodity-rates
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/jsonl',
}
response = requests.post('https://api.spider.cloud/data/commodity-rates',
headers=headers)
print(response.json())
{ "content": { "energy": { "Crude Oil": 76.98, "Brent": 79.74, }, "metals": { "Gold": 2430.58, "Silver": 27.449, }, "agricultural": { "Soybeans": 1004.82, "Wheat": 542, }, "industrial": { "Bitumen": 3590, "Cobalt": 26500, }, "livestock": { "Feeder Cattle": 246.1513, "Live Cattle": 184.1172, }, "index": { "CRB Index": 325.77, "LME Index": 3793.2, }, "electricity": { "United Kingdom": 81.15, "Germany": 75.16, }, "other": { "NVDA": 104.75, "TSLA": 200, }, }, "error": null }