Skip to main content
NEW AI Studio is now available Try it now

Content & Publishing

Batch your sources, zero format wrangling

You need full articles from dozens of sites in a consistent format. Not truncated RSS. Not raw HTML. Pass a comma-separated list of URLs to Spider's crawl endpoint. It handles JavaScript rendering when needed, strips the noise with readability extraction, and hands you clean markdown. Same shape every time, regardless of where it came from.

Your scrapers are working breaking

Every custom scraper is a ticking clock. Sites redesign. Paywalls change their cookie flow. JavaScript frameworks swap out the DOM. You find out at 2am when your pipeline goes silent.

One API replaces all of them. Batch your URLs into a single request. No selectors to maintain, no rendering to manage, no format glue code.

03:41:12 ERR reuters scraper: selector .article-body not found (site redesign?)
03:41:12 WARN reuters: falling back to RSS... got 2 sentences + "read more"
03:41:14 ERR medium scraper: 403 Forbidden (cloudflare challenge)
03:41:15 ERR substack scraper: empty body (content loads via JS, scraper is HTTP-only)
03:41:18 WARN dev.to: got HTML but readability extraction returned null
03:41:19 ERR techcrunch: paywall cookie expired, need manual browser login
03:41:19 INFO pipeline: 1/6 sources returned usable content
03:41:19 INFO paging on-call engineer...

From noise to signal

Every web page is 90% navigation, ads, and layout. Spider isolates the article and returns just the content you need.

RAW PAGE
<nav class="site-header"> <a href="/">Home</a> <a href="/about">About</a> ... 47 more links ... </nav> <div class="ad-banner"> <script src="ads.js"></script> </div> <div class="cookie-popup"> <button>Accept All</button> </div> <article> <h1>The Actual Title</h1> <p>The content you actually wanted...</p> </article> <div class="sidebar"> <div class="related">...</div> <div class="newsletter">...</div> </div> <footer> ... 200 lines of footer ... </footer>
readability extraction
SPIDER OUTPUT
{
  "url": "https://example.com/post",
  "status": 200,
  "metadata": {
    "title": "The Actual Title",
    "description": "A clear ...",
    "keywords": ["web", "dev"],
    "og_image": "https://..."
  },
  "content": "# The Actual Title
\nThe content you actually
wanted, in clean markdown.
\nNo nav. No ads. No
cookie banners.\n..."
}

Every source, same shape

Reuters wraps articles in a React app. Substack uses server-rendered HTML. Dev.to has an API but it returns a different schema. Your newsletter tool expects one format. Spider makes that possible.

What your codebase loses (and gains)

deleted 14 files, 3,200 lines
📁 scrapers/
📄 reuters.ts
📄 medium.ts
📄 substack.ts
📄 techcrunch.ts
📄 devto.ts
📄 ... 4 more
📁 formatters/
📄 normalize-html.ts
📄 rss-parser.ts
📄 metadata-extract.ts
📄 puppeteer-pool.ts
📄 selector-registry.json
added 1 file, 12 lines
📄 content-feed.ts

Import the client, pass your URLs as a comma-separated list, get clean markdown. The entire aggregation layer is one function that calls spider.crawlUrl() with readability: true and metadata: true .

Node.js Python Rust cURL

Built for teams that ship content daily

Newsletters

Curate at scale

Pull from your source list, extract the key paragraphs, feed them into your template. What used to take 2 hours of tab-switching becomes one API call.

Research & Intelligence

Track topics across the web

Regulatory updates, competitor announcements, academic pre-prints. Aggregate specialized sources into your analysis workflow or knowledge base.

AI Products

Feed your RAG pipeline

Clean markdown with consistent metadata. Ready to chunk, embed, and retrieve. Keep your AI grounded in current information, not stale training data.

Under the hood

RENDERING Core

Smart JavaScript rendering

The default request: "smart" mode detects when a page needs JavaScript and automatically falls back to Chrome rendering. For JS-heavy sources, set request: "chrome" to force full browser rendering on every page.

DELIVERY Webhooks

Push, do not poll

Set up a webhook endpoint and Spider pushes content to your app the moment it is ready. No cron jobs checking for updates on a loop.

METADATA Extraction

Structured fields on every page

Enable metadata: true to get title, description, keywords, Open Graph image, domain, file size, and resource type on every page. Combine with return_headers: true for full HTTP response headers.

PRICING Flat rate

No per-page surcharges

Crawl 10 URLs or 10,000. No credit multipliers for JavaScript rendering or "premium" domains. Costs stay predictable as your source list grows.

Stop maintaining scrapers. Start shipping content.

One API replaces every custom scraper and format-conversion script in your pipeline. Use it alongside RSS or as a complete replacement when feeds are unavailable or truncated.