POST/v1/scrape

Scrape

Scrape a single URL and return the content in your desired format. Uses a 5-tier parallel scraping engine with automatic strategy selection and domain-level strategy caching.

Target Latency

1.2s - 4.5s

Credits

1 cr/req

Parameters

NameTypeRequirementDescription
urlstringREQUIREDThe URL to scrape. Protocol is auto-prepended if missing.
formatsstring[]optionalOutput formats: "markdown", "html", "raw_html", "links", "screenshot", "structured_data", "headings", "images".
only_main_contentbooleanoptionalExtract only the main content, removing navs, footers, sidebars.
mobilebooleanoptionalEmulate a mobile device viewport.
mobile_devicestringoptionalDevice preset name: "iphone_14", "pixel_7", "ipad_pro".
timeoutnumberoptionalRequest timeout in milliseconds.
wait_fornumberoptionalWait this many ms after page load before extracting.
css_selectorstringoptionalOnly extract content matching this CSS selector.
xpathstringoptionalXPath expression for targeted extraction.
include_tagsstring[]optionalOnly include these HTML tags in extraction.
exclude_tagsstring[]optionalExclude these HTML tags from extraction.
use_proxybooleanoptionalRoute request through a configured proxy for anti-bot bypass.
headersobjectoptionalCustom HTTP headers to send (e.g. { "Cookie": "session=abc" }).
cookiesobjectoptionalCustom cookies to send as name/value pairs.
actionsActionStep[]optionalBrowser actions to perform before extraction: click, wait, scroll, type, screenshot, hover, press, select, fill_form, evaluate.
extractobjectoptionalLLM extraction config: { prompt: string, schema: JSONSchema }.
webhook_urlstringoptionalWebhook URL for job completion notification.
webhook_secretstringoptionalHMAC secret for webhook signature verification.
capture_networkbooleanoptionalCapture browser network requests/responses.

cURL Example

curl -X POST "https://api.datablue.dev/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://news.ycombinator.com",
  "formats": [
    "markdown",
    "links"
  ],
  "only_main_content": true
}'

System Responses

200 OK

Request processed successfully.

401 UNAUTHORIZED

Missing or invalid API key.

429 RATE LIMIT

System capacity exceeded.

500 SYSTEM FAILURE

Internal core exception.

EXAMPLE RESPONSE
{
  "success": true,
  "data": {
    "markdown": "# Hacker News\n\n1. Show HN: I built an open-source web scraper with strategy caching\n2. Why Rust is eating the world...",
    "links": [
      "https://news.ycombinator.com/item?id=39912345",
      "https://news.ycombinator.com/item?id=39912346",
      "https://news.ycombinator.com/newest"
    ],
    "metadata": {
      "title": "Hacker News",
      "description": null,
      "language": "en",
      "source_url": "https://news.ycombinator.com",
      "status_code": 200,
      "word_count": 1847,
      "reading_time_seconds": 7,
      "content_length": 12340
    }
  }
}