Map
Discover all URLs on a website by combining sitemap.xml parsing, robots.txt discovery, and link crawling.
The map() method returns a flat list of discovered URLs with metadata. This is useful for understanding site structure before launching a targeted crawl.
Basic Usage
from datablue import DataBlue
with DataBlue(api_key="wh_your_api_key") as client:
result = client.map("https://docs.python.org")
print(f"Total URLs: {result.total}")
for link in result.links:
print(f" {link.url}")
if link.title:
print(f" Title: {link.title}")
if link.lastmod:
print(f" Last modified: {link.lastmod}")
With Search Filter
# Only find URLs containing "tutorial"
result = client.map(
"https://docs.python.org",
search="tutorial",
limit=50,
)
print(f"Found {result.total} tutorial URLs")
for link in result.links:
print(f" {link.url}")
URL Shorthand
Use the urls property to get a flat list of URL strings:
result = client.map("https://example.com", limit=200)
# Get just the URLs as a plain list
url_list = result.urls # ["https://example.com/", "https://example.com/about", ...]
print(f"Found {len(url_list)} URLs")
# Feed into a crawl or batch scrape
crawl = client.crawl(url_list[0], max_pages=len(url_list))
Async
from datablue import AsyncDataBlue
async with AsyncDataBlue(api_key="wh_your_api_key") as client:
result = await client.map(
"https://example.com",
limit=500,
include_subdomains=True,
)
print(f"Found {result.total} URLs")
for url in result.urls:
print(url)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url | str | required | Website URL to map |
search | str | None | Filter URLs matching this search string |
limit | int | 100 | Maximum number of URLs to return |
include_subdomains | bool | True | Include URLs from subdomains |
use_sitemap | bool | True | Parse sitemap.xml for URL discovery |
Response Model
class MapResult:
success: bool # Whether the map request succeeded
total: int # Total number of links discovered on the site
links: list[LinkResult] # List of discovered links with URL and optional metadata
error: str | None # Human-readable error message if the map failed
job_id: str | None # Unique job identifier for async map requests
# Properties
urls: list[str] # Convenience: [link.url for link in links]
class LinkResult:
url: str # The discovered URL
title: str | None # Page title (from sitemap or page metadata)
description: str | None # Page description (from sitemap or meta tags)
lastmod: str | None # Last modification date in ISO 8601 format (from sitemap)
priority: float | None # Sitemap priority value between 0.0 and 1.0