JSON Scraping
Many websites embed structured data directly in their HTML: product details in JSON-LD, page data in Next.js __NEXT_DATA__ scripts, or API payloads in SSR hydration. Spider can extract this structured data directly, bypassing the need to parse HTML. Set return_json_data: true to get the embedded JSON from any page.
Example JSON-LD Microformat
Extracting JSON-LD from a Page
Here is a practical example: extracting recipe data from a page that embeds JSON-LD structured data. The return_format: "empty" parameter tells Spider to skip the HTML content and only return the extracted JSON.
Scraping JSON from HTML Using API in Python
Example Response
The JSON will be in the response under the other_scripts array:
Example Response JSON
Scrape Next.js SSR data embedded in the HTML
Using the same return_json_data parameter, we can also scrape the SSR data on Next.js pages and other similar JS frontend frameworks. Example response for SSR, the JSON object is found in the NEXT_DATA property:
Scraping Next.js SSR Data