POST/v1/extract

Extract

Extract structured data from web pages or raw content using LLM. Accepts URLs to scrape first, or raw markdown/HTML content directly. Returns typed JSON matching your schema.

Target Latency

1.2s - 4.5s

Credits

5 cr/req

Parameters

NameTypeRequirementDescription
urlstringoptionalSingle URL to scrape then extract from.
urlsstring[]optionalMultiple URLs to scrape and extract from (async job).
contentstringoptionalRaw markdown/text content to extract from (no scraping needed).
htmlstringoptionalRaw HTML to convert and extract from.
promptstringoptionalNatural language extraction instruction (e.g. 'Extract all product names and prices').
schemaobjectoptionalJSON Schema for structured output. The LLM will return data matching this schema.
providerstringoptionalLLM provider: "openai", "anthropic", "groq", etc.
only_main_contentbooleanoptionalExtract only main content before LLM processing.
wait_fornumberoptionalWait ms after page load (for URLs).
timeoutnumberoptionalScrape timeout in ms (for URLs).
use_proxybooleanoptionalUse proxy for scraping.
headersobjectoptionalCustom HTTP headers.
cookiesobjectoptionalCustom cookies.
webhook_urlstringoptionalWebhook URL for extraction completion notification.
webhook_secretstringoptionalHMAC secret for webhook signature verification.

cURL Example

curl -X POST "https://api.datablue.dev/v1/extract" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://openai.com/pricing",
  "prompt": "Extract all pricing tiers with name, price per million tokens, and context window",
  "schema": {
    "type": "object",
    "properties": {
      "tiers": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "input_price": {
              "type": "string"
            },
            "output_price": {
              "type": "string"
            },
            "context_window": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}'

System Responses

200 OK

Request processed successfully.

401 UNAUTHORIZED

Missing or invalid API key.

429 RATE LIMIT

System capacity exceeded.

500 SYSTEM FAILURE

Internal core exception.

EXAMPLE RESPONSE
{
  "success": true,
  "data": {
    "url": "https://openai.com/pricing",
    "extract": {
      "tiers": [
        {
          "name": "GPT-4o",
          "input_price": "$2.50/1M",
          "output_price": "$10.00/1M",
          "context_window": "128K"
        },
        {
          "name": "GPT-4o mini",
          "input_price": "$0.15/1M",
          "output_price": "$0.60/1M",
          "context_window": "128K"
        },
        {
          "name": "GPT-4.1",
          "input_price": "$2.00/1M",
          "output_price": "$8.00/1M",
          "context_window": "1M"
        }
      ]
    },
    "content_length": 48230
  }
}