PubMed Health Scraper

Extract biomedical literature citations, abstracts, author affiliations, and journal metadata from PubMed. Built on spider-browser .

Get started Docs

target
pubmed.ncbi.nlm.nih.gov: success rate
99.9%: latency
~4ms

Quick start

Extract data in minutes.

pubmed-health-scraper.ts

import { SpiderBrowser } from "spider-browser";

const spider = new SpiderBrowser({
  apiKey: process.env.SPIDER_API_KEY!,
});

await spider.connect();
const page = spider.page!;
await page.goto("https://pubmed.ncbi.nlm.nih.gov/?term=diabetes+treatment");

const data = await page.evaluate(`(() => {
  const articles = [];
  document.querySelectorAll(".docsum-content").forEach(el => {
    const title = el.querySelector(".docsum-title")?.textContent?.trim();
    const authors = el.querySelector(".docsum-authors")?.textContent?.trim();
    const journal = el.querySelector(".docsum-journal-citation")?.textContent?.trim();
    const pmid = el.closest("[data-docid]")?.getAttribute("data-docid");
    if (title) articles.push({ title, authors, journal, pmid });
  });
  return JSON.stringify({ total: articles.length, articles: articles.slice(0, 10) });
})()`);

console.log(JSON.parse(data));
await spider.close();

ready to run · spider-browser · TypeScript

Fetch API

One endpoint for pubmed.ncbi.nlm.nih.gov.

Structured JSON from pubmed.ncbi.nlm.nih.gov with a single POST. AI-resolved selectors, cached on the first call.

POST /fetch/pubmed.ncbi.nlm.nih.gov/

Article titleAuthorsAbstractJournalDOIPMID

Try it Fetch docs

cURL

curl -X POST https://api.spider.cloud/fetch/pubmed.ncbi.nlm.nih.gov/ \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"return_format": "json"}'

Python

import requests

resp = requests.post(
    "https://api.spider.cloud/fetch/pubmed.ncbi.nlm.nih.gov/",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={"return_format": "json"},
)
print(resp.json())

Node.js

const resp = await fetch("https://api.spider.cloud/fetch/pubmed.ncbi.nlm.nih.gov/", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ return_format: "json" }),
});
const data = await resp.json();
console.log(data);

Extraction

Fields you can pull.

Article titleAuthorsAbstractJournalDOIPMIDPublication dateKeywords

Content

Medical data extraction

Extract drug info, conditions, and health articles from pubmed.ncbi.nlm.nih.gov.

Parsing

Structured health data

Clean extraction of dosage, interactions, and clinical information.

Scale

Bulk research

Process thousands of medical pages for research and comparison datasets.

More Health scrapers.

webmd.com

WebMD Scraper

Extract drug info, condition descriptions, symptom data, and health articles from WebMD.

healthline.com

Healthline Scraper

Extract health articles, condition guides, nutrition data, and wellness content from Healthline.

goodrx.com

GoodRx Scraper

Extract drug pricing, pharmacy comparisons, coupon data, and generic alternatives from GoodRx.

Start

Start scraping pubmed.ncbi.nlm.nih.gov.

Grab an API key and call the endpoint above. The first request resolves the config; every request after hits cache.

Get started free All scrapers