News
reuters.com, apnews.com, bbc.co.uk
Blogs
dev.to, medium.com, substack.com
Docs
docs.aws.com, developer.mozilla.org
Unified Output
Clean markdown, structured JSON, ready for your app
Media & Publishing
Aggregate Content
from Multiple Sources
Build news feeds, research platforms, or content curation tools with automated web aggregation. Spider crawls your sources, extracts clean content, and delivers it in a unified format. No RSS required.
Why Spider
The old way is broken
The Bottleneck
- Content is spread across many different sites
- Each source has different HTML structures
- RSS feeds are incomplete or unavailable
- Real-time updates require constant polling
A Better Way
- Crawl any website, no RSS required
- Automatic content extraction in clean markdown
- Batch processing for regular updates
- Webhook delivery for real-time integration
Features
Built for content aggregation
Multi-Source Crawling
Crawl dozens of sources in parallel with a single API call. News sites, blogs, documentation portals, research databases. Spider handles the complexity of different HTML structures and returns uniform, clean data.
Readability Extraction
Automatically extract the main content, removing ads, navigation, and clutter. Get just the article text.
Metadata Parsing
Extract titles, authors, publish dates, and images from each article automatically.
Deduplication
Automatic URL normalization prevents duplicate content from polluting your feed.
Batch Updates
Process all of your sources in a single API call. Efficient, fast, and straightforward.
Webhook Delivery
Push new content to your app as soon as it is crawled. No polling, no delays. Your pipeline stays fresh.
AI Extraction
Use natural language prompts to extract exactly the data you need from each page. Spider returns structured JSON, shaped the way you want it.
Quick Start
Aggregate in one call
Pass a list of URLs. Get clean, unified content back. That is the whole workflow.
// Aggregate from multiple sources
import Spider from '@spider-cloud/spider-client'
const spider = new Spider()
const sources = [
"https://reuters.com/technology",
"https://dev.to/latest",
"https://news.ycombinator.com",
"https://blog.cloudflare.com",
]
const results = await spider.crawlUrl(sources, {
return_format: "markdown",
readability: true,
metadata: true,
}) // Clean, unified response
[
{
"url": "https://reuters.com/technology",
"content": "# Latest Tech News\n\n...",
"metadata": {
"title": "Technology News",
"description": "Latest..."
}
},
{
"url": "https://dev.to/latest",
"content": "# Dev Community\n\n...",
"metadata": { ... }
},
// ... all sources, same format
] How It Works
Multiple streams, one clean output
1. Point to your sources
Pass a list of URLs you want to aggregate. News sites, blogs, documentation, knowledge bases, anything with a URL.
2. Spider does the work
Parallel crawling, content extraction, metadata parsing, and deduplication. Handled automatically at scale.
3. Get clean data back
Structured markdown or JSON, uniform across every source. Ready for your news feed, research tool, or AI pipeline.
Explore More