Crawling an entire website shouldn’t be complicated. Yet in practice, it often is. Many developers rely on fragile custom scripts that break regularly or headless browsers that consume massive amounts of RAM.

To simplify this process, Cloudflare has introduced a new /crawl endpoint (currently in open beta) within its Cloudflare Browser Rendering platform.

The idea is simple: send a URL to the API and Cloudflare handles the rest—discovering pages, rendering them if needed, and returning the content in structured formats such as HTML, Markdown, or JSON.

Table of Contents

How the Cloudflare Crawl API Works

Using the new endpoint is straightforward. You send a POST request containing a starting URL, and the service automatically:

discovers pages via internal links or the sitemap
renders pages using a headless browser if necessary
extracts the page content
returns the results asynchronously

Instead of waiting for the entire crawl to finish, the API returns a job ID. You can then query the API later to retrieve the results.

This asynchronous approach makes it easier to crawl large websites without blocking your application.

Step 1: Create a Cloudflare API Token

Before launching a crawl job, you must create an API token with the proper permissions.

Inside your Cloudflare dashboard, generate a new token with the permission:

Browser Rendering – Edit

You will also need your Account ID, which is visible:

in the dashboard URL
or in the Overview section of any domain

Step 2: Launch a Crawl Job

Starting a crawl is as simple as sending a single API request.

Example:

curl -X POST "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/browser-rendering/crawl" \
 -H "Authorization: Bearer YOUR_TOKEN" \
 -H "Content-Type: application/json" \
 -d '{"url": "https://example.com"}'

The API returns a job ID such as:

c7f8s2d9-a8e7-4b6e...

By default, the crawler explores:

10 pages maximum
unlimited depth

However, this limit can be easily adjusted.

Customizing the Crawl

For more control, you can specify several parameters.

Example configuration:

curl -X POST "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/browser-rendering/crawl" \
 -H "Authorization: Bearer YOUR_TOKEN" \
 -H "Content-Type: application/json" \
 -d '{
 "url": "https://example.com/docs",
 "limit": 50,
 "depth": 3,
 "formats": ["markdown"],
 "render": false,
 "options": {
 "includePatterns": ["https://example.com/docs/**"],
 "excludePatterns": ["**/changelog/**"]
 }
 }'

Important options include:

limit – maximum number of pages to crawl
depth – maximum link depth
formats – output format (HTML, Markdown, JSON)
render – whether to render pages in a headless browser

Setting render: false retrieves raw HTML without launching a browser, which is significantly faster for static websites.

During the beta phase, this mode is not billed, making it attractive for experimentation.

Step 3: Retrieve Crawl Results

Once the crawl job starts, you can check its status using a GET request:

curl "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/browser-rendering/crawl/YOUR_JOB_ID" \
 -H "Authorization: Bearer YOUR_TOKEN"

The response includes:

job status (running, completed, errored)
list of crawled pages
extracted content in the chosen format

If the results exceed 10 MB, pagination is automatically enabled.

Advanced Crawling Options

Cloudflare also includes several advanced parameters designed for more complex scraping workflows.

Incremental Crawling

You can crawl only recently updated pages using:

modifiedSince
maxAge

Sitemap-only Crawling

If you want to avoid parsing internal links, you can specify:

source: "sitemaps"

AI-Powered Structured Extraction

The API integrates with Workers AI, allowing structured data extraction via prompts.

For example, you could automatically extract:

product names
prices
stock status

from hundreds of e-commerce pages in a single crawl.

Resource Filtering

To speed up crawling, you can block unnecessary resources using:

rejectResourceTypes

This allows you to ignore:

images
fonts
CSS files

Authentication Support

The authenticate option enables crawling of websites protected by HTTP basic authentication.

Important Limitations to Know

Although powerful, the Cloudflare crawler has a few restrictions.

A crawl job can run up to 7 days.
Results are stored for 14 days only.
The crawler respects robots.txt rules, including crawl-delay.

If a site blocks access, the results will show the URL as “disallowed”, but you’ll need to check the site’s robots.txt manually to understand the restriction.

Available on Cloudflare Workers Plans

The /crawl endpoint is available for both Free and Paid plans of Cloudflare Workers.

Cloudflare also provides additional APIs for:

webpage screenshots
PDF generation
targeted scraping

A Promising Tool for Developers and Data Scrapers

With the introduction of the /crawl endpoint, Cloudflare is clearly positioning its infrastructure as a full-featured web crawling and scraping platform.

A built-in crawler that:

respects robots.txt
outputs Markdown or structured JSON
runs on scalable infrastructure
and works even on the free plan

is definitely something developers and data engineers will want to keep an eye on.

Did you enjoy this article? Feel free to share it on social media and subscribe to our newsletter so you never miss a post!

And if you'd like to go a step further in supporting us, you can treat us to a virtual coffee ☕️. Thank you for your support ❤️!

⚠️ Legal Disclaimer: This website is an informational and educational tech blog. The content provided aims to help users better understand technologies, software, online tools, and digital practices.

We do not support or promote any form of piracy, copyright infringement, or illegal use of software, video content, or digital resources.

Any mention of third-party sites, tools, or platforms is purely for informational purposes. It is the responsibility of each reader to comply with the laws in their country, as well as the terms of use of the services mentioned.

We strongly encourage the use of legal, open-source, or official solutions in a responsible manner.

Categorized in:

How To Software Tech

Tagged in:

cloudflare browser rendering crawl, cloudflare crawl api, cloudflare structured data extraction, cloudflare web crawler, cloudflare workers scraping, crawl websites with api, web scraping cloudflare workers

Cloudflare Introduces a New /crawl API to Easily Crawl Entire Websites

How the Cloudflare Crawl API Works

Step 1: Create a Cloudflare API Token

Step 2: Launch a Crawl Job

Customizing the Crawl

Step 3: Retrieve Crawl Results

Advanced Crawling Options

Incremental Crawling

Sitemap-only Crawling

AI-Powered Structured Extraction

Resource Filtering

Authentication Support

Important Limitations to Know

Available on Cloudflare Workers Plans

A Promising Tool for Developers and Data Scrapers

About the Author

Lea Fontaine

Check latest articles from this author:

EmDash by Cloudflare: A New Open-Source CMS Aiming to Fix WordPress Security

Hanuman Jayanti AI Photo Prompts for Boys (2026 Guide Using Gemini)

Can Someone See If You Search Them on Instagram? (2026 Guide)

Comments

Leave a Reply Cancel reply

QBZ: The Native Qobuz Client Linux Users Have Been Waiting For

Gmail Finally Lets You Change Your Email Address (Without Losing Data)

EmDash by Cloudflare: A New Open-Source CMS Aiming to Fix WordPress Security

QBZ: The Native Qobuz Client Linux Users Have Been Waiting For

Gmail Finally Lets You Change Your Email Address (Without Losing Data)

EmDash by Cloudflare: A New Open-Source CMS Aiming to Fix WordPress Security

Press ESC to close

Or check our Popular Categories...

How the Cloudflare Crawl API Works

Step 1: Create a Cloudflare API Token

Step 2: Launch a Crawl Job

Customizing the Crawl

Step 3: Retrieve Crawl Results

Advanced Crawling Options

Incremental Crawling

Sitemap-only Crawling

AI-Powered Structured Extraction

Resource Filtering

Authentication Support

Important Limitations to Know

Available on Cloudflare Workers Plans

A Promising Tool for Developers and Data Scrapers

About the Author

Check latest articles from this author:

Comments

Leave a Reply Cancel reply