Get leads from websites
Extracting data from websites using AI to consistently get lead information like emails, phone numbers, and more.
Our system handles:
- gathering the data with crawls
- advanced filtering pages
- AI enhanced data extracting with OpenAI and Open-Source models
- contact management
How It Works
Traditional contact extraction requires maintaining CSS selectors that break when sites update their HTML, plus handling anti-bot protections. Spider eliminates both problems. The crawler handles access and anti-bot measures, while AI models extract contact data from the page content regardless of HTML structure.
UI (Extracting Contacts)
You can use the UI on the dashboard to extract contacts after you crawled a page. Go to the page you want to extract and click on the horizontal dropdown menu to display an option to extract the contact. The crawl will get the data first to see if anything new has changed. Afterwards if a contact was found usually within 10-60 seconds you will get a notification that the extraction is complete with the data.

After extraction if the page has contact related data you can view it with a grid in the app.

The grid will display the name, email, phone, title, and host(website found) of the contact(s).

API Extracting Usage
The endpoint /pipeline/extract-contacts provides the ability to extract all contacts from a website concurrently.
API Extracting Example
To extract contacts from a website you can follow the example below. All params are optional except url. Use the prompt param to adjust the way the AI handles the extracting. If you use the param store_data or if the website already exist in the dashboard the contact data will be saved with the page.
import requests, os, json
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
json_data = {"limit":1,"url":"http://www.example.com/contacts", "model": "gpt-4o", "prompt": "A custom prompt to tailor the extracting."}
response = requests.post('https://api.spider.cloud/v1/pipeline/extract-contacts',
headers=headers,
json=json_data,
stream=True)
for line in response.iter_lines():
if line:
print(json.loads(line))
Pipelines Combo
Pipelines bring a whole new entry to workflows for data curation, if you combine the API endpoints to only use the extraction on pages you know may have contacts can save credits on the system. One way would be to perform gathering all the links first with the /links endpoint. After getting the links for the pages use /pipeline/filter-links with a custom prompt that can use AI to reduce the noise of the links to process before /pipline/extract-contacts.
Loading graph...
*note: pipeline routes are deprecated.
Empower any project with AI-ready data
Join thousands of developers using Spider to power their data pipelines.