POST/v1/extract
Extract
Extract structured data from web pages or raw content using LLM. Accepts URLs to scrape first, or raw markdown/HTML content directly. Returns typed JSON matching your schema.
Target Latency
1.2s - 4.5s
Credits
5 cr/req
Parameters
| Name | Type | Requirement | Description |
|---|---|---|---|
| url | string | optional | Single URL to scrape then extract from. |
| urls | string[] | optional | Multiple URLs to scrape and extract from (async job). |
| content | string | optional | Raw markdown/text content to extract from (no scraping needed). |
| html | string | optional | Raw HTML to convert and extract from. |
| prompt | string | optional | Natural language extraction instruction (e.g. 'Extract all product names and prices'). |
| schema | object | optional | JSON Schema for structured output. The LLM will return data matching this schema. |
| provider | string | optional | LLM provider: "openai", "anthropic", "groq", etc. |
| only_main_content | boolean | optional | Extract only main content before LLM processing. |
| wait_for | number | optional | Wait ms after page load (for URLs). |
| timeout | number | optional | Scrape timeout in ms (for URLs). |
| use_proxy | boolean | optional | Use proxy for scraping. |
| headers | object | optional | Custom HTTP headers. |
| cookies | object | optional | Custom cookies. |
| webhook_url | string | optional | Webhook URL for extraction completion notification. |
| webhook_secret | string | optional | HMAC secret for webhook signature verification. |
cURL Example
curl -X POST "https://api.datablue.dev/v1/extract" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://openai.com/pricing",
"prompt": "Extract all pricing tiers with name, price per million tokens, and context window",
"schema": {
"type": "object",
"properties": {
"tiers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"input_price": {
"type": "string"
},
"output_price": {
"type": "string"
},
"context_window": {
"type": "string"
}
}
}
}
}
}
}'System Responses
200 OK
Request processed successfully.
401 UNAUTHORIZED
Missing or invalid API key.
429 RATE LIMIT
System capacity exceeded.
500 SYSTEM FAILURE
Internal core exception.
EXAMPLE RESPONSE
{
"success": true,
"data": {
"url": "https://openai.com/pricing",
"extract": {
"tiers": [
{
"name": "GPT-4o",
"input_price": "$2.50/1M",
"output_price": "$10.00/1M",
"context_window": "128K"
},
{
"name": "GPT-4o mini",
"input_price": "$0.15/1M",
"output_price": "$0.60/1M",
"context_window": "128K"
},
{
"name": "GPT-4.1",
"input_price": "$2.00/1M",
"output_price": "$8.00/1M",
"context_window": "1M"
}
]
},
"content_length": 48230
}
}