Error Handling

The SDK raises typed exceptions for all API errors. Every exception inherits from DataBlueError, making it easy to catch all errors or handle specific types. The HTTP client automatically retries transient errors (429 and 5xx) with exponential backoff before raising.

Exception Hierarchy

DataBlueError                         # Base exception for all SDK errors
    AuthenticationError               # 401 — bad or missing API key / JWT
    NotFoundError                     # 404 — resource does not exist
    RateLimitError                    # 429 — rate limit exceeded (retryable)
    ServerError                       # 5xx — server error (retryable)
    JobFailedError                    # Polled job completed with "failed" status
    TimeoutError                      # Polling timeout exceeded

Basic Error Handling

from datablue import (
    DataBlue,
    DataBlueError,
    AuthenticationError,
    RateLimitError,
    NotFoundError,
    ServerError,
    JobFailedError,
    TimeoutError,
)

with DataBlue(api_key="wh_your_api_key") as client:
    try:
        result = client.scrape("https://example.com")
    except AuthenticationError as e:
        print(f"Auth failed: {e.message}")
        print(f"Status: {e.status_code}")        # 401
        print(f"Docs: {e.docs_url}")              # https://docs.datablue.dev/errors/authentication
    except RateLimitError as e:
        print(f"Rate limited: {e.message}")
        print(f"Retry after: {e.retry_after}s")   # seconds to wait
        print(f"Retryable: {e.is_retryable}")     # True
    except NotFoundError as e:
        print(f"Not found: {e.message}")           # 404
    except ServerError as e:
        print(f"Server error ({e.status_code}): {e.message}")
        print(f"Retryable: {e.is_retryable}")     # True
    except DataBlueError as e:
        print(f"API error: {e.message}")
        print(f"Status: {e.status_code}")
        print(f"Body: {e.response_body}")

Job Errors (Crawl / Search)

from datablue import DataBlue, JobFailedError, TimeoutError

with DataBlue(api_key="wh_your_api_key") as client:
    try:
        status = client.crawl(
            "https://example.com",
            max_pages=100,
            timeout=60.0,       # fail if not done in 60s
        )
    except TimeoutError as e:
        print(f"Timed out after {e.elapsed:.1f}s")
        print(f"Job ID: {e.job_id}")
        # Optionally cancel the still-running job
        client.cancel_crawl(e.job_id)
    except JobFailedError as e:
        print(f"Job failed: {e.message}")
        print(f"Job ID: {e.job_id}")
        print(f"Response: {e.response_body}")

Exception Attributes

Attribute Type Available On Description
messagestrAllHuman-readable error description
status_codeint | NoneAllHTTP status code (if from API response)
response_bodydict | NoneAllRaw API response body
is_retryableboolAllWhether the request can be safely retried
retry_afterfloat | NoneRateLimitErrorSeconds to wait before retrying
docs_urlstr | NoneAllLink to documentation for this error type
job_idstr | NoneJobFailedError, TimeoutErrorJob ID that failed or timed out
elapsedfloat | NoneTimeoutErrorSeconds elapsed before timeout

AI-Friendly Error Messages

v2.0.0 errors include fix suggestions directly in the message, making them useful for both humans and AI coding assistants:

# AuthenticationError message includes fix instructions:
# "Authentication failed. Set DATABLUE_API_KEY environment variable
#  or pass api_key to DataBlue(api_key='wh_...')"

# RateLimitError includes wait time:
# "Rate limit exceeded. Wait 42s before retrying,
#  or reduce request frequency."

# TimeoutError includes fix suggestion:
# "Job crawl-abc123 did not complete within 300s.
#  Try increasing the timeout parameter."

# ServerError indicates auto-retry:
# "Server error (502). This request will be automatically retried."

Automatic Retries

The SDK automatically retries on transient errors before raising an exception:

  • 429 (Rate Limit) — waits for the Retry-After header, or uses exponential backoff (max 30s)
  • 5xx (Server Error) — exponential backoff: 0.5s, 1s, 2s (max 10s per wait)
  • Connection errors — same exponential backoff as 5xx
  • Max retries: 3 by default, configurable via max_retries parameter