WebCrawler.buzz — Free Online Web Crawler

API Documentation

Full REST API reference for WebCrawler.buzz. Start crawls, track progress, retrieve results, and export data — all programmatically.

Base URL: https://webcrawler.buzz

POST /api/crawl

Start a new crawl. Validates the URL, creates a job, pushes to the crawl queue, and returns a tracking URL immediately.

Request Body (JSON)

{
  "url": "https://example.com"
}

Response (201 — New Crawl)

{
  "status": "started",
  "job_id": "abc-123-def",
  "tracking_url": "https://webcrawler.buzz/results?job_id=abc-123-def",
  "message": "Crawl started! Use tracking_url to follow progress."
}

Response (200 — Domain Already Crawled)

{
  "status": "exists",
  "job_id": "xyz-789",
  "domain": "example.com",
  "total_pages": 42,
  "tracking_url": "https://webcrawler.buzz/results?job_id=xyz-789",
  "prompt": "Send action 're-crawl' or 'use-existing' to POST /api/crawl/decide"
}
POST /api/crawl/decide

When a domain was already crawled, choose to re-crawl or use existing results.

Request Body (JSON)

{
  "domain": "example.com",
  "action": "re-crawl"  // or "use-existing"
}
GET /api/crawl/:job_id/progress

Lightweight progress polling. Returns current status and completion percentage.

Response

{
  "job_id": "abc-123-def",
  "status": "running",
  "total_pages_found": 25,
  "total_pages_queued": 100,
  "progress_percent": 25
}
GET /api/crawl/:job_id?page=1&limit=50

Get paginated crawl results. Supports 50, 100, 500, or 1000 results per page.

Query Parameters

ParamTypeDefaultDescription
pageinteger1Page number
limitinteger50Results per page (50, 100, 500, 1000)
POST /api/crawl/:job_id/notify

Register an email address to receive a notification when the crawl completes.

Request Body (JSON)

{
  "email": "[email protected]"
}
POST /api/crawl/:job_id/export

Request a CSV export of the crawl results. A download link will be sent to the provided email address.

Request Body (JSON)

{
  "email": "[email protected]"
}

Operational Endpoints

GET /api/health

Returns system health status including database and Redis connectivity.

GET /api/stats

Returns public usage statistics (total crawls, pages crawled, etc.).

⚠️ Rate Limits

POST endpoints are rate-limited to 200 requests per 15 minutes per IP. GET endpoints (progress polling, results) are not rate-limited.