Data Connectors
Pipe results straight into cloud storage or a database as pages come back. The data_connectors parameter works on every endpoint (crawl, scrape, search, screenshot). Spider sends each result to your destination as soon as it's ready.
Supported Connectors
Pick one or more per request. They fire in parallel and work alongside webhooks.
s3Upload each page as a JSON object to an S3 bucket.Google Cloud StoragegcsUpload pages to a GCS bucket using service account credentials.Google Sheetsgoogle_sheetsAppend rows to a spreadsheet as pages are processed.Azure Blob Storageazure_blobWrite pages to an Azure Storage container.SupabasesupabaseInsert rows into a Supabase Postgres table via PostgREST.Event Triggers
Two boolean flags on the data_connectors object decide when results get sent.
on_find: Send the full page content as soon as it's ready. Most common option.on_find_metadata: Send lightweight metadata only (URL, status, headers) without the body.
false. You must set at least one to true or the connector will not fire. Use on_find for most use cases.Upload each page as a JSON object to an S3 bucket. Objects are keyed by domain and timestamp.
| Field | Required | Description |
|---|---|---|
bucket | Yes | The S3 bucket name. |
access_key_id | Yes | AWS access key ID. |
secret_access_key | Yes | AWS secret access key. |
region | No | AWS region. Defaults to us-east-1. |
prefix | No | Key prefix for uploaded objects (e.g. "crawls/2024/"). |
content_type | No | MIME type for objects. Defaults to application/json. |
Stream results to S3
import requests, os
headers = {
"Authorization": f"Bearer {os.getenv('SPIDER_API_KEY')}",
"Content-Type": "application/json",
}
response = requests.post("https://api.spider.cloud/crawl", headers=headers, json={
"url": "https://example.com",
"limit": 50,
"return_format": "markdown",
"data_connectors": {
"s3": {
"bucket": "my-crawl-data",
"access_key_id": os.getenv("AWS_ACCESS_KEY_ID"),
"secret_access_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"region": "us-west-2",
"prefix": "crawls/"
},
"on_find": True
}
})
print(response.json())Upload pages to a GCS bucket. Pass the same service account JSON key you download from the IAM & Admin console, base64-encoded.
| Field | Required | Description |
|---|---|---|
bucket | Yes | The GCS bucket name. |
service_account_base64 | Yes | Base64-encoded service account JSON key. |
prefix | No | Key prefix for uploaded objects. |
Stream results to GCS
import requests, os, base64
headers = {
"Authorization": f"Bearer {os.getenv('SPIDER_API_KEY')}",
"Content-Type": "application/json",
}
# base64-encode your service account JSON file
with open("service-account.json", "rb") as f:
sa_b64 = base64.b64encode(f.read()).decode()
response = requests.post("https://api.spider.cloud/crawl", headers=headers, json={
"url": "https://example.com",
"limit": 50,
"return_format": "markdown",
"data_connectors": {
"gcs": {
"bucket": "my-gcs-bucket",
"service_account_base64": sa_b64,
"prefix": "spider-data/"
},
"on_find": True
}
})
print(response.json())Append results as rows to a Google Sheets spreadsheet. Share the spreadsheet with your service account email and set sheet_name to target a specific tab.
| Field | Required | Description |
|---|---|---|
spreadsheet_id | Yes | The spreadsheet ID from the Google Sheets URL. |
service_account_base64 | Yes | Base64-encoded service account JSON key. |
sheet_name | No | Target sheet tab. Defaults to "Sheet1". |
Stream results to Google Sheets
import requests, os, base64
headers = {
"Authorization": f"Bearer {os.getenv('SPIDER_API_KEY')}",
"Content-Type": "application/json",
}
with open("service-account.json", "rb") as f:
sa_b64 = base64.b64encode(f.read()).decode()
response = requests.post("https://api.spider.cloud/crawl", headers=headers, json={
"url": "https://example.com",
"limit": 20,
"return_format": "markdown",
"data_connectors": {
"google_sheets": {
"spreadsheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
"service_account_base64": sa_b64,
"sheet_name": "Crawl Results"
},
"on_find": True
}
})
print(response.json())Write each page to an Azure Storage container. Pass the full connection string from the Azure portal.
| Field | Required | Description |
|---|---|---|
connection_string | Yes | Azure Storage connection string. |
container | Yes | The container name. |
prefix | No | Blob name prefix for uploaded objects. |
Stream results to Azure Blob Storage
import requests, os
headers = {
"Authorization": f"Bearer {os.getenv('SPIDER_API_KEY')}",
"Content-Type": "application/json",
}
response = requests.post("https://api.spider.cloud/crawl", headers=headers, json={
"url": "https://example.com",
"limit": 50,
"return_format": "markdown",
"data_connectors": {
"azure_blob": {
"connection_string": os.getenv("AZURE_STORAGE_CONNECTION_STRING"),
"container": "crawl-data",
"prefix": "results/"
},
"on_find": True
}
})
print(response.json())Insert results into a Supabase Postgres table via PostgREST. Rows are batched automatically so you don't need to handle pagination.
| Field | Required | Description |
|---|---|---|
url | Yes | Supabase project URL (e.g. https://xxx.supabase.co). |
anon_key | Yes | Supabase anon or service role key. |
table | Yes | Target table name for row inserts. |
Stream results to Supabase
import requests, os
headers = {
"Authorization": f"Bearer {os.getenv('SPIDER_API_KEY')}",
"Content-Type": "application/json",
}
response = requests.post("https://api.spider.cloud/crawl", headers=headers, json={
"url": "https://example.com",
"limit": 50,
"return_format": "markdown",
"data_connectors": {
"supabase": {
"url": "https://your-project.supabase.co",
"anon_key": os.getenv("SUPABASE_ANON_KEY"),
"table": "crawled_pages"
},
"on_find": True
}
})
print(response.json())Multiple Connectors
You can configure more than one connector in the same request. All active connectors fire in parallel alongside webhooks.
S3 + Supabase in a single request
{
"url": "https://example.com",
"limit": 100,
"data_connectors": {
"s3": {
"bucket": "archive-bucket",
"access_key_id": "AKIA...",
"secret_access_key": "wJal..."
},
"supabase": {
"url": "https://xxx.supabase.co",
"anon_key": "eyJhbGci...",
"table": "pages"
},
"on_find": true
}
}