POST /scrape

Web Scraping API

Extract content from any web page in a single request. Spider handles JavaScript rendering, anti-bot challenges, and content extraction so you get clean, structured data without managing browser infrastructure.

Start Scraping Try in Playground

[
  {
    "url": "https://example.com/pricing",
    "status": 200,
    "content": "# Pricing\n\n## Starter\n$29/mo...",
    "metadata": {
      "title": "Pricing - Example",
      "description": "Plans and pricing"
    }
  }
]

Scrape extracts from URLs you already know. Need to discover pages first? Use Crawl instead.

Precision Extraction

// target exactly what you need

request body

{
  "url": "https://store.example.com/product/123",
  "css_extraction_map": {
    "name":  "h1.product-title",
    "price": ".price-value",
    "desc":  ".product-description",
    "image": "img.hero-image@src"
  }
}

extracted result

{
  "css_extracted": {
    "name":  "Wireless Headphones Pro",
    "price": "$149.99",
    "desc":  "Premium noise-cancelling...",
    "image": "https://cdn.example.com/hp.jpg"
  }
}

Key Capabilities

CSS & XPath Extraction

Define a css_extraction_map to pull specific elements. Extract prices from product pages, headlines from articles, or any targeted data using selectors you already know.

JavaScript Rendering

Modern websites render content with JavaScript. Spider's chrome and smart request modes ensure you see the same content as a real browser.

Batch URLs

Pass multiple URLs in a single request by comma-separating them or sending an array of objects. Reduce round-trips and latency when you have a list of known pages.

Readability Mode

Enable readability to strip away navigation, sidebars, footers, and ads. Get only the main article or body content, ideal for content analysis and NLP tasks.

JSON Data Extraction

Extract structured JSON-LD, server-rendered data, and JavaScript-embedded objects from pages with return_json_data.

Custom JavaScript

Run your own JavaScript on the page before extraction with evaluate_on_new_document. Click buttons, dismiss modals, or transform the DOM.

Code Examples

Python cURL JavaScript

from spider import Spider

client = Spider()

result = client.scrape(
    "https://example.com/pricing",
    params={
        "return_format": "markdown",
        "metadata": True,
    }
)

print(result[0]["content"])

curl -X POST https://api.spider.cloud/scrape \
  -H "Authorization: Bearer $SPIDER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "css_extraction_map": {
      "title": "h1.product-title",
      "price": ".price-value",
      "description": ".product-description"
    }
  }'

import Spider from "@spider-cloud/spider-client";

const client = new Spider();

// Comma-separated URLs in a single request
const pages = await client.scrape(
  "https://example.com/page-1,https://example.com/page-2",
  {
    return_format: "text",
    return_headers: true,
  }
);

What You Get Back

Standard Response

url The final URL after any redirects
content Page content in your chosen format
status HTTP status code of the response

With Optional Fields

metadata Title, description, keywords
links All links found on the page
headers HTTP response headers
cookies Cookies set by the page

Popular Use Cases

Price Monitoring

Track product prices across e-commerce sites in real time. Use CSS selectors to target price elements and build automated pricing dashboards.

Content Feeds

Pull the latest articles, headlines, or blog posts from known URLs into your application. Readability mode delivers clean text ready for display.

Data Enrichment

Enrich your CRM or database records by scraping company websites for metadata, descriptions, and contact details from known profile URLs.

API Replacement

When a website lacks an API, use Scrape with CSS selectors to create your own structured data feed from any public web page.