Crawling an entire website shouldn’t be complicated. Yet in practice, it often is. Many developers rely on fragile custom scripts that break regularly or headless browsers that consume massive amounts of RAM.
To simplify this process, Cloudflare has introduced a new /crawl endpoint (currently in open beta) within its Cloudflare Browser Rendering platform.
The idea is simple: send a URL to the API and Cloudflare handles the rest—discovering pages, rendering them if needed, and returning the content in structured formats such as HTML, Markdown, or JSON.

How the Cloudflare Crawl API Works
Using the new endpoint is straightforward. You send a POST request containing a starting URL, and the service automatically:
- discovers pages via internal links or the sitemap
- renders pages using a headless browser if necessary
- extracts the page content
- returns the results asynchronously
Instead of waiting for the entire crawl to finish, the API returns a job ID. You can then query the API later to retrieve the results.
This asynchronous approach makes it easier to crawl large websites without blocking your application.
Step 1: Create a Cloudflare API Token
Before launching a crawl job, you must create an API token with the proper permissions.
Inside your Cloudflare dashboard, generate a new token with the permission:
Browser Rendering – Edit
You will also need your Account ID, which is visible:
- in the dashboard URL
- or in the Overview section of any domain
Step 2: Launch a Crawl Job
Starting a crawl is as simple as sending a single API request.
Example:
curl -X POST "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/browser-rendering/crawl" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
The API returns a job ID such as:
c7f8s2d9-a8e7-4b6e...
By default, the crawler explores:
- 10 pages maximum
- unlimited depth
However, this limit can be easily adjusted.
Customizing the Crawl
For more control, you can specify several parameters.
Example configuration:
curl -X POST "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/browser-rendering/crawl" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/docs",
"limit": 50,
"depth": 3,
"formats": ["markdown"],
"render": false,
"options": {
"includePatterns": ["https://example.com/docs/**"],
"excludePatterns": ["**/changelog/**"]
}
}'
Important options include:
- limit – maximum number of pages to crawl
- depth – maximum link depth
- formats – output format (HTML, Markdown, JSON)
- render – whether to render pages in a headless browser
Setting render: false retrieves raw HTML without launching a browser, which is significantly faster for static websites.
During the beta phase, this mode is not billed, making it attractive for experimentation.
Step 3: Retrieve Crawl Results
Once the crawl job starts, you can check its status using a GET request:
curl "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/browser-rendering/crawl/YOUR_JOB_ID" \
-H "Authorization: Bearer YOUR_TOKEN"
The response includes:
- job status (
running,completed,errored) - list of crawled pages
- extracted content in the chosen format
If the results exceed 10 MB, pagination is automatically enabled.
Advanced Crawling Options
Cloudflare also includes several advanced parameters designed for more complex scraping workflows.
Incremental Crawling
You can crawl only recently updated pages using:
modifiedSincemaxAge
Sitemap-only Crawling
If you want to avoid parsing internal links, you can specify:
source: "sitemaps"
AI-Powered Structured Extraction
The API integrates with Workers AI, allowing structured data extraction via prompts.
For example, you could automatically extract:
- product names
- prices
- stock status
from hundreds of e-commerce pages in a single crawl.
Resource Filtering
To speed up crawling, you can block unnecessary resources using:
rejectResourceTypes
This allows you to ignore:
- images
- fonts
- CSS files
Authentication Support
The authenticate option enables crawling of websites protected by HTTP basic authentication.
Important Limitations to Know
Although powerful, the Cloudflare crawler has a few restrictions.
- A crawl job can run up to 7 days.
- Results are stored for 14 days only.
- The crawler respects robots.txt rules, including
crawl-delay.
If a site blocks access, the results will show the URL as “disallowed”, but you’ll need to check the site’s robots.txt manually to understand the restriction.
Available on Cloudflare Workers Plans
The /crawl endpoint is available for both Free and Paid plans of Cloudflare Workers.
Cloudflare also provides additional APIs for:
- webpage screenshots
- PDF generation
- targeted scraping
A Promising Tool for Developers and Data Scrapers
With the introduction of the /crawl endpoint, Cloudflare is clearly positioning its infrastructure as a full-featured web crawling and scraping platform.
A built-in crawler that:
- respects robots.txt
- outputs Markdown or structured JSON
- runs on scalable infrastructure
- and works even on the free plan
is definitely something developers and data engineers will want to keep an eye on.
And if you'd like to go a step further in supporting us, you can treat us to a virtual coffee ☕️. Thank you for your support ❤️!
We do not support or promote any form of piracy, copyright infringement, or illegal use of software, video content, or digital resources.
Any mention of third-party sites, tools, or platforms is purely for informational purposes. It is the responsibility of each reader to comply with the laws in their country, as well as the terms of use of the services mentioned.
We strongly encourage the use of legal, open-source, or official solutions in a responsible manner.


Comments