Searching with Spider
Instant Search
Spider’s search endpoint returns results in under 2 seconds. Combine it with scraping to discover and extract content in a single request. Common use cases:
- Feeding real-time content into large language models (LLMs)
- Building intelligent agents and data pipelines
- Crawling and collecting fresh, targeted data
Search Endpoint Usage
POST /search
Use this endpoint to compile a list of relevant websites for crawling and resource collection.
Search Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
search | string | Yes | — | The search query to execute. Supports advanced search operators (e.g. site:, intitle:, filetype:). |
search_limit | integer | No | 0 (all) | Maximum number of results to return. Acts as a filter on top of the results. Set to 0 to return all results. |
num | integer | No | 10 | Number of results to request per search page. Controls how many results the search engine returns per page. |
page | integer | No | 1 | The search results page to retrieve. Use with num for manual pagination. |
fetch_page_content | boolean | No | false | When true, Spider crawls each search result URL and returns the full page content. Standard crawl parameters apply (see below). |
location | string | No | — | The geographic location the search should originate from (e.g. "San Diego, CA", "London, UK"). |
country | string | No | — | Two-letter country code to prioritize results from (e.g. "us", "fr", "de"). |
language | string | No | — | Two-letter language code for search results (e.g. "en", "es", "ja"). |
tbs | string | No | — | Time-based search filter. Restricts results to a specific time period. See values below. |
quick_search | boolean | No | true | Enables fast search mode with parallel provider queries and automatic retries for broader coverage. Set to false for single-provider queries. |
auto_pagination | boolean | No | false | Automatically paginate through search result pages until search_limit is reached. Useful for collecting large result sets without manually incrementing page. |
Time-Based Search Filters (tbs)
| Value | Description |
|---|---|
qdr:h | Past hour |
qdr:d | Past 24 hours |
qdr:w | Past week |
qdr:m | Past month |
qdr:y | Past year |
Crawl Parameters (when fetch_page_content is true)
When fetch_page_content is enabled, all standard crawl parameters are available alongside search parameters. These let you control how Spider processes each result URL. Common options include:
| Parameter | Type | Description |
|---|---|---|
return_format | string | Content format for crawled pages: "markdown", "raw", "text", "html2text", etc. |
limit | integer | Maximum pages to crawl per result URL. Set to 1 to scrape only the landing page. |
readability | boolean | Apply readability pre-processing to extract the main article content. |
root_selector | string | CSS selector to extract only specific content from each page. |
exclude_selector | string | CSS selector to exclude elements from the extracted content. |
proxy | string | Proxy type for the crawl (e.g. "residential", "datacenter"). |
request_timeout | integer | Timeout in seconds for each page request. |
headers | object | Custom HTTP headers to send with each crawl request. |
locale | string | Locale for the crawl request (e.g. "en-US"). |
stealth | boolean | Enable stealth mode for the browser. |
webhooks | object | Webhook destination to stream results to as they arrive. |
For the full list of crawl parameters, see the Scraping and Crawling docs.
Example Request (Python)
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {
"search": "sports news today",
"search_limit": 5
}
response = requests.post('https://api.spider.cloud/search', headers=headers, json=json_data)
print(response.json())
Search Results Format
The API returns structured results as an array of objects:
[
{
"title": "ESPN – Serving Sports Fans. Anytime. Anywhere.",
"description": "Visit ESPN for live scores, highlights and sports news. Stream exclusive games on ESPN+ and play fantasy sports.",
"url": "https://www.espn.com/"
},
{
"title": "Sports Illustrated",
"description": "SI.com provides sports news, expert analysis, highlights, stats and scores for the NFL, NBA, MLB, NHL, college football, soccer...",
"url": "https://www.si.com/"
}
]
Search and Scrape in One Request
Set fetch_page_content: true to search and scrape in one request. All standard crawl parameters work alongside search parameters, so you can control output format, depth, and proxy settings.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {
"search": "spider web crawling",
"return_format": "raw",
"fetch_page_content": True,
"search_limit": 10,
"limit": 1
}
response = requests.post('https://api.spider.cloud/search', headers=headers, json=json_data)
print(response.json())
When fetch_page_content is enabled, the response format changes to include full crawl data:
[
{
"error": null,
"status": 200,
"duration_elasped_ms": 120,
"costs": {
"file_cost": 0.000363,
"ai_cost": 0,
"compute_cost": 7e-8,
"transform_cost": 0,
"total_cost": 0.00036307,
"bytes_transferred_cost": 0
},
"url": "https://en.wikipedia.org/wiki/Web_crawler",
"content": "<!DOCTYPE html><html><body>content...</body></html>"
}
]
This is useful for:
- Extracting full content from top search results
- Automated research and summarization pipelines
- Reducing round-trips in data collection workflows
Geo-Targeted Search
Use location, country, and language to get results as a user in a specific region would see them.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {
"search": "latest sports news",
"search_limit": 5,
"language": "en",
"country": "us",
"location": "San Diego, CA"
}
response = requests.post('https://api.spider.cloud/search', headers=headers, json=json_data)
print(response.json())
Time-Filtered Search
Restrict results to a specific time period with the tbs parameter.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {
"search": "AI breakthroughs",
"search_limit": 10,
"tbs": "qdr:w"
}
response = requests.post('https://api.spider.cloud/search', headers=headers, json=json_data)
print(response.json())
Auto-Pagination
Set auto_pagination: true to automatically collect results across multiple search pages until your search_limit is reached.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {
"search": "machine learning tutorials",
"search_limit": 50,
"auto_pagination": True
}
response = requests.post('https://api.spider.cloud/search', headers=headers, json=json_data)
print(response.json())
Batch Multiple Queries
Send an array of query objects to execute multiple searches in a single API call. Each query runs independently and returns its own result set. Streaming is not supported for batch requests.
import requests, os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = [
{
"search": "latest sports news united states",
"search_limit": 5
},
{
"search": "latest news around the globe",
"search_limit": 5
}
]
response = requests.post('https://api.spider.cloud/search', headers=headers, json=json_data)
print(response.json())
Rate Limits
- Up to 50,000 search requests per minute
- Multiple search providers for redundancy
- Distributed crawling and parsing for fetched content