Map

Discover all URLs on a website by combining sitemap.xml parsing, robots.txt discovery, and link crawling. The map() method returns a flat list of discovered URLs with metadata. This is useful for understanding site structure before launching a targeted crawl.

Basic Usage

from datablue import DataBlue

with DataBlue(api_key="wh_your_api_key") as client:
    result = client.map("https://docs.python.org")

    print(f"Total URLs: {result.total}")
    for link in result.links:
        print(f"  {link.url}")
        if link.title:
            print(f"    Title: {link.title}")
        if link.lastmod:
            print(f"    Last modified: {link.lastmod}")

With Search Filter

# Only find URLs containing "tutorial"
result = client.map(
    "https://docs.python.org",
    search="tutorial",
    limit=50,
)

print(f"Found {result.total} tutorial URLs")
for link in result.links:
    print(f"  {link.url}")

URL Shorthand

Use the urls property to get a flat list of URL strings:

result = client.map("https://example.com", limit=200)

# Get just the URLs as a plain list
url_list = result.urls    # ["https://example.com/", "https://example.com/about", ...]
print(f"Found {len(url_list)} URLs")

# Feed into a crawl or batch scrape
crawl = client.crawl(url_list[0], max_pages=len(url_list))

Async

from datablue import AsyncDataBlue

async with AsyncDataBlue(api_key="wh_your_api_key") as client:
    result = await client.map(
        "https://example.com",
        limit=500,
        include_subdomains=True,
    )
    print(f"Found {result.total} URLs")
    for url in result.urls:
        print(url)

Parameters

Parameter Type Default Description
urlstrrequiredWebsite URL to map
searchstrNoneFilter URLs matching this search string
limitint100Maximum number of URLs to return
include_subdomainsboolTrueInclude URLs from subdomains
use_sitemapboolTrueParse sitemap.xml for URL discovery

Response Model

class MapResult:
    success: bool                        # Whether the map request succeeded
    total: int                           # Total number of links discovered on the site
    links: list[LinkResult]              # List of discovered links with URL and optional metadata
    error: str | None                    # Human-readable error message if the map failed
    job_id: str | None                   # Unique job identifier for async map requests

    # Properties
    urls: list[str]                      # Convenience: [link.url for link in links]

class LinkResult:
    url: str                             # The discovered URL
    title: str | None                    # Page title (from sitemap or page metadata)
    description: str | None              # Page description (from sitemap or meta tags)
    lastmod: str | None                  # Last modification date in ISO 8601 format (from sitemap)
    priority: float | None               # Sitemap priority value between 0.0 and 1.0