AI Data Extraction
DeprecatedGo beyond raw content extraction. Spider's AI pipelines understand web pages semantically, pulling out contacts and leads, generating question-answer pairs, categorizing websites, and filtering links based on relevance. Crawl + AI in one integrated workflow.
Four AI Pipelines
Extract Contacts
POST /pipeline/extract-contacts Crawl websites and use AI to identify and extract contact information including email addresses, phone numbers, social profiles, and business details. Results are stored and queryable via the contacts data API.
Questions & Answers
POST /pipeline/extract-qa Crawl a website and generate structured Q&A pairs from its content. Provide an inquiry or topic and Spider produces relevant questions with answers grounded in the actual page content.
Label Website
POST /pipeline/label Crawl a website and have AI categorize it into topics, industries, or custom labels. Useful for building directories, classifying leads, or organizing large collections of URLs.
Filter Links
POST /pipeline/filter-links Crawl a website's links and use AI to filter them based on relevance, content type, or custom criteria. Keep only the URLs that match your data collection goals, eliminating noise.
Bonus: Crawl from Text
POST /pipeline/crawl-text Paste raw text or markdown containing URLs, and Spider will automatically extract every link and crawl them. Skip the step of parsing URLs yourself. Just send the document, email body, or notes and let Spider handle discovery. Supports up to 10 MB of input text.
Code Examples
from spider import Spider
client = Spider()
# Extract contacts from a company website
contacts = client.extract_contacts(
"https://example.com",
params={
"limit": 50,
}
)
for contact in contacts:
print(contact) curl -X POST https://api.spider.cloud/pipeline/extract-qa \
-H "Authorization: Bearer $SPIDER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"limit": 25,
"return_format": "markdown"
}' import Spider from "@spider-cloud/spider-client";
const client = new Spider();
const result = await client.label(
"https://example.com",
{ limit: 10 }
);
console.log(result);
// [{ url: "...", labels: ["Technology", "SaaS", "Developer Tools"] }] Popular AI Extraction Use Cases
Sales Lead Generation
Crawl target company websites to extract emails, phone numbers, and team member details. Build prospect lists automatically instead of manual research.
Fine-Tuning Datasets
Generate Q&A pairs from documentation sites and knowledge bases to create training data for domain-specific language models and chatbots.
Website Directories
Label and categorize large collections of URLs for building topical directories, industry databases, or content recommendation systems.
Smart Link Discovery
Filter a website's links to find only product pages, blog posts, or documentation. Skip navigation, legal pages, and irrelevant content.
Related Resources
Extract intelligence from any website
Combine web crawling with AI understanding. Pull structured data that goes beyond raw HTML.