Configuration

The SDK uses an immutable ClientConfig dataclass for all configuration. You can pass parameters directly to the constructor, use environment variables, or build a config object manually.

Constructor Parameters

from datablue import DataBlue

client = DataBlue(
    api_url="https://api.datablue.dev",   # Base URL (default: http://localhost:8000)
    api_key="wh_your_api_key",             # API key (wh_ prefix)
    timeout=120.0,                          # Request timeout in seconds (default: 60)
    max_retries=5,                          # Max retry attempts (default: 3)
)

ClientConfig Object

For advanced control, build a ClientConfig and pass it to the constructor:

from datablue import DataBlue, ClientConfig

config = ClientConfig(
    api_url="https://api.datablue.dev",
    api_key="wh_your_api_key",
    timeout=120.0,
    max_retries=5,
    backoff_factor=1.0,                  # Multiplier for exponential backoff (default: 0.5)
)

client = DataBlue(config=config)

Config from Environment

from datablue import DataBlue, ClientConfig

# Build config from DATABLUE_* env vars
config = ClientConfig.from_env()

# Use with either client type
sync_client = DataBlue(config=config)
from datablue import AsyncDataBlue, ClientConfig

config = ClientConfig.from_env()
async_client = AsyncDataBlue(config=config)

Cloning Configs

Configs are immutable (frozen dataclass). Use clone() to create modified copies for different environments:

from datablue import DataBlue, ClientConfig

# Base config
prod = ClientConfig(
    api_url="https://api.datablue.dev",
    api_key="wh_prod_key",
    timeout=60.0,
    max_retries=3,
)

# Derive staging config (inherits everything except overrides)
staging = prod.clone(
    api_url="https://staging.datablue.dev",
    api_key="wh_staging_key",
)

# Derive a fast config for time-sensitive operations
fast = prod.clone(timeout=10.0, max_retries=1)

# Use each
with DataBlue(config=prod) as client:
    result = client.scrape("https://example.com")

ClientConfig Fields

Field Type Default Description
api_urlstrhttp://localhost:8000Base URL of the DataBlue API (trailing slash auto-stripped)
api_keystr | NoneNoneAPI key with wh_ prefix
timeoutfloat60.0HTTP request timeout in seconds
max_retriesint3Maximum retry attempts on transient errors (429, 5xx, connection errors)
backoff_factorfloat0.5Multiplier for exponential backoff: delay = factor * 2^attempt

Self-Hosted Setup

Point the SDK at your self-hosted DataBlue instance by setting the api_url:

# Direct constructor
with DataBlue(
    api_url="https://scraper.internal.company.com",
    api_key="wh_internal_key",
) as client:
    result = client.scrape("https://example.com")
# Or via environment variables
export DATABLUE_API_URL=https://scraper.internal.company.com
export DATABLUE_API_KEY=wh_internal_key
from datablue import DataBlue

with DataBlue.from_env() as client:
    result = client.scrape("https://example.com")
    print(result.data.markdown)

Default URL: The SDK defaults to http://localhost:8000 which works out of the box with the Docker Compose development setup. For production deployments, always set the URL explicitly.

Complete API Reference (v2.0.0)

Method Description
scrape(url, **opts)Scrape a single URL, returns ScrapeResult
crawl(url, **opts)Crawl a site (blocking with polling), returns CrawlStatus
start_crawl(url, **opts)Start crawl (non-blocking), returns CrawlJob
get_crawl_status(job_id)Poll crawl status, returns CrawlStatus
cancel_crawl(job_id)Cancel an in-progress crawl
crawl_stream(url, **opts)Stream crawl pages via NDJSON, returns Iterator[CrawlPageData]
crawl_stream_with_callback(url, on_document=..., **opts)Callback-based crawl streaming (on_document, on_complete, on_error)
search(query, **opts)Search the web (blocking with polling), returns SearchStatus
start_search(query, **opts)Start search (non-blocking), returns SearchJob
get_search_status(job_id)Poll search status, returns SearchStatus
map(url, **opts)Discover URLs on a site, returns MapResult
batch_scrape(urls, **opts)Scrape multiple URLs, returns list[ScrapeResult]
batch_scrape_iter(urls, **opts)Async-only: stream batch results as they complete, returns AsyncIterator[ScrapeResult]
login(email, password)Authenticate with email/password, stores JWT internally
close()Close the HTTP connection pool
from_env()Class method: create client from DATABLUE_* env vars

AI-First Documentation Files

v2.0.0 ships with machine-readable reference files for AI coding assistants:

File Location Purpose
CLAUDE.mdsdk/CLAUDE.mdComplete SDK quick-reference: all method signatures, response models, error types, and patterns. Read automatically by Claude Code and other AI assistants.
llms.txtsdk/llms.txtStandardized machine-readable documentation following the llms.txt convention. Condensed API surface for LLM consumption.

Why AI-first? AI coding assistants hallucinate API calls when they lack accurate documentation. The CLAUDE.md file ensures AI assistants generate code using real method signatures, real parameter names, and real response types — no guessing.