Configuration

The SDK uses an immutable ClientConfig dataclass for all configuration. You can pass parameters directly to the constructor, use environment variables, or build a config object manually.

Constructor Parameters

from datablue import DataBlue

client = DataBlue(
    api_url="https://api.datablue.dev",   # Base URL (default: http://localhost:8000)
    api_key="wh_your_api_key",             # API key (wh_ prefix)
    timeout=120.0,                          # Request timeout in seconds (default: 60)
    max_retries=5,                          # Max retry attempts (default: 3)
)

ClientConfig Object

For advanced control, build a ClientConfig and pass it to the constructor:

from datablue import DataBlue, ClientConfig

config = ClientConfig(
    api_url="https://api.datablue.dev",
    api_key="wh_your_api_key",
    timeout=120.0,
    max_retries=5,
    backoff_factor=1.0,                  # Multiplier for exponential backoff (default: 0.5)
)

client = DataBlue(config=config)

Config from Environment

from datablue import DataBlue, ClientConfig

# Build config from DATABLUE_* env vars
config = ClientConfig.from_env()

# Use with either client type
sync_client = DataBlue(config=config)

from datablue import AsyncDataBlue, ClientConfig

config = ClientConfig.from_env()
async_client = AsyncDataBlue(config=config)

Cloning Configs

Configs are immutable (frozen dataclass). Use clone() to create modified copies for different environments:

from datablue import DataBlue, ClientConfig

# Base config
prod = ClientConfig(
    api_url="https://api.datablue.dev",
    api_key="wh_prod_key",
    timeout=60.0,
    max_retries=3,
)

# Derive staging config (inherits everything except overrides)
staging = prod.clone(
    api_url="https://staging.datablue.dev",
    api_key="wh_staging_key",
)

# Derive a fast config for time-sensitive operations
fast = prod.clone(timeout=10.0, max_retries=1)

# Use each
with DataBlue(config=prod) as client:
    result = client.scrape("https://example.com")

ClientConfig Fields

Field	Type	Default	Description
`api_url`	str	http://localhost:8000	Base URL of the DataBlue API (trailing slash auto-stripped)
`api_key`	str \| None	None	API key with wh_ prefix
`timeout`	float	60.0	HTTP request timeout in seconds
`max_retries`	int	3	Maximum retry attempts on transient errors (429, 5xx, connection errors)
`backoff_factor`	float	0.5	Multiplier for exponential backoff: delay = factor * 2^attempt

Self-Hosted Setup

Point the SDK at your self-hosted DataBlue instance by setting the api_url:

# Direct constructor
with DataBlue(
    api_url="https://scraper.internal.company.com",
    api_key="wh_internal_key",
) as client:
    result = client.scrape("https://example.com")

# Or via environment variables
export DATABLUE_API_URL=https://scraper.internal.company.com
export DATABLUE_API_KEY=wh_internal_key

from datablue import DataBlue

with DataBlue.from_env() as client:
    result = client.scrape("https://example.com")
    print(result.data.markdown)

Default URL: The SDK defaults to http://localhost:8000 which works out of the box with the Docker Compose development setup. For production deployments, always set the URL explicitly.

Complete API Reference (v2.0.0)

Method	Description
`scrape(url, **opts)`	Scrape a single URL, returns ScrapeResult
`crawl(url, **opts)`	Crawl a site (blocking with polling), returns CrawlStatus
`start_crawl(url, **opts)`	Start crawl (non-blocking), returns CrawlJob
`get_crawl_status(job_id)`	Poll crawl status, returns CrawlStatus
`cancel_crawl(job_id)`	Cancel an in-progress crawl
`crawl_stream(url, **opts)`	Stream crawl pages via NDJSON, returns Iterator[CrawlPageData]
`crawl_stream_with_callback(url, on_document=..., **opts)`	Callback-based crawl streaming (on_document, on_complete, on_error)
`search(query, **opts)`	Search the web (blocking with polling), returns SearchStatus
`start_search(query, **opts)`	Start search (non-blocking), returns SearchJob
`get_search_status(job_id)`	Poll search status, returns SearchStatus
`map(url, **opts)`	Discover URLs on a site, returns MapResult
`batch_scrape(urls, **opts)`	Scrape multiple URLs, returns list[ScrapeResult]
`batch_scrape_iter(urls, **opts)`	Async-only: stream batch results as they complete, returns AsyncIterator[ScrapeResult]
`login(email, password)`	Authenticate with email/password, stores JWT internally
`close()`	Close the HTTP connection pool
`from_env()`	Class method: create client from DATABLUE_* env vars

AI-First Documentation Files

v2.0.0 ships with machine-readable reference files for AI coding assistants:

File	Location	Purpose
`CLAUDE.md`	sdk/CLAUDE.md	Complete SDK quick-reference: all method signatures, response models, error types, and patterns. Read automatically by Claude Code and other AI assistants.
`llms.txt`	sdk/llms.txt	Standardized machine-readable documentation following the llms.txt convention. Condensed API surface for LLM consumption.

Why AI-first? AI coding assistants hallucinate API calls when they lack accurate documentation. The CLAUDE.md file ensures AI assistants generate code using real method signatures, real parameter names, and real response types — no guessing.