Configuration
The SDK uses an immutable ClientConfig dataclass for all configuration. You can pass parameters directly to the constructor, use environment variables, or build a config object manually.
Constructor Parameters
from datablue import DataBlue
client = DataBlue(
api_url="https://api.datablue.dev", # Base URL (default: http://localhost:8000)
api_key="wh_your_api_key", # API key (wh_ prefix)
timeout=120.0, # Request timeout in seconds (default: 60)
max_retries=5, # Max retry attempts (default: 3)
)
ClientConfig Object
For advanced control, build a ClientConfig and pass it to the constructor:
from datablue import DataBlue, ClientConfig
config = ClientConfig(
api_url="https://api.datablue.dev",
api_key="wh_your_api_key",
timeout=120.0,
max_retries=5,
backoff_factor=1.0, # Multiplier for exponential backoff (default: 0.5)
)
client = DataBlue(config=config)
Config from Environment
from datablue import DataBlue, ClientConfig
# Build config from DATABLUE_* env vars
config = ClientConfig.from_env()
# Use with either client type
sync_client = DataBlue(config=config)
from datablue import AsyncDataBlue, ClientConfig
config = ClientConfig.from_env()
async_client = AsyncDataBlue(config=config)
Cloning Configs
Configs are immutable (frozen dataclass). Use clone() to create modified copies for different environments:
from datablue import DataBlue, ClientConfig
# Base config
prod = ClientConfig(
api_url="https://api.datablue.dev",
api_key="wh_prod_key",
timeout=60.0,
max_retries=3,
)
# Derive staging config (inherits everything except overrides)
staging = prod.clone(
api_url="https://staging.datablue.dev",
api_key="wh_staging_key",
)
# Derive a fast config for time-sensitive operations
fast = prod.clone(timeout=10.0, max_retries=1)
# Use each
with DataBlue(config=prod) as client:
result = client.scrape("https://example.com")
ClientConfig Fields
| Field | Type | Default | Description |
|---|---|---|---|
api_url | str | http://localhost:8000 | Base URL of the DataBlue API (trailing slash auto-stripped) |
api_key | str | None | None | API key with wh_ prefix |
timeout | float | 60.0 | HTTP request timeout in seconds |
max_retries | int | 3 | Maximum retry attempts on transient errors (429, 5xx, connection errors) |
backoff_factor | float | 0.5 | Multiplier for exponential backoff: delay = factor * 2^attempt |
Self-Hosted Setup
Point the SDK at your self-hosted DataBlue instance by setting the api_url:
# Direct constructor
with DataBlue(
api_url="https://scraper.internal.company.com",
api_key="wh_internal_key",
) as client:
result = client.scrape("https://example.com")
# Or via environment variables
export DATABLUE_API_URL=https://scraper.internal.company.com
export DATABLUE_API_KEY=wh_internal_key
from datablue import DataBlue
with DataBlue.from_env() as client:
result = client.scrape("https://example.com")
print(result.data.markdown)
Default URL: The SDK defaults to http://localhost:8000 which works out of the box with the Docker Compose development setup. For production deployments, always set the URL explicitly.
Complete API Reference (v2.0.0)
| Method | Description |
|---|---|
scrape(url, **opts) | Scrape a single URL, returns ScrapeResult |
crawl(url, **opts) | Crawl a site (blocking with polling), returns CrawlStatus |
start_crawl(url, **opts) | Start crawl (non-blocking), returns CrawlJob |
get_crawl_status(job_id) | Poll crawl status, returns CrawlStatus |
cancel_crawl(job_id) | Cancel an in-progress crawl |
crawl_stream(url, **opts) | Stream crawl pages via NDJSON, returns Iterator[CrawlPageData] |
crawl_stream_with_callback(url, on_document=..., **opts) | Callback-based crawl streaming (on_document, on_complete, on_error) |
search(query, **opts) | Search the web (blocking with polling), returns SearchStatus |
start_search(query, **opts) | Start search (non-blocking), returns SearchJob |
get_search_status(job_id) | Poll search status, returns SearchStatus |
map(url, **opts) | Discover URLs on a site, returns MapResult |
batch_scrape(urls, **opts) | Scrape multiple URLs, returns list[ScrapeResult] |
batch_scrape_iter(urls, **opts) | Async-only: stream batch results as they complete, returns AsyncIterator[ScrapeResult] |
login(email, password) | Authenticate with email/password, stores JWT internally |
close() | Close the HTTP connection pool |
from_env() | Class method: create client from DATABLUE_* env vars |
AI-First Documentation Files
v2.0.0 ships with machine-readable reference files for AI coding assistants:
| File | Location | Purpose |
|---|---|---|
CLAUDE.md | sdk/CLAUDE.md | Complete SDK quick-reference: all method signatures, response models, error types, and patterns. Read automatically by Claude Code and other AI assistants. |
llms.txt | sdk/llms.txt | Standardized machine-readable documentation following the llms.txt convention. Condensed API surface for LLM consumption. |
Why AI-first? AI coding assistants hallucinate API calls when they lack accurate documentation. The CLAUDE.md file ensures AI assistants generate code using real method signatures, real parameter names, and real response types — no guessing.