Cloudflare accuses Perplexity of secret web crawling

Decoder

The digital battleground between internet infrastructure giant Cloudflare and AI search engine Perplexity has intensified, with Cloudflare publicly accusing Perplexity of covertly crawling websites despite explicit blocks. This escalating dispute highlights growing tensions over data scraping ethics and the evolving rules of the internet in the age of artificial intelligence.

The conflict ignited on August 4, 2025, when Cloudflare published a detailed blog post alleging that Perplexity was violating established web standards. Cloudflare’s investigation was prompted by customer complaints, where website owners observed that their content was still being accessed by Perplexity even after they had explicitly blocked the AI service’s official crawlers, “PerplexityBot” and “Perplexity-User,” through robots.txt files or Web Application Firewalls (WAFs).

Cloudflare claims Perplexity employs a “two-pronged approach” to bypass these restrictions. When its declared bots are blocked, the company allegedly deploys undeclared crawlers that impersonate standard web browsers, such as Chrome on macOS, by rotating through unlisted IP addresses and changing their source Autonomous Systems (ASNs). Cloudflare conducted controlled experiments on new, unindexed domains with strict robots.txt and firewall rules, finding that Perplexity could still summarize secret content placed behind these restrictions. Cloudflare’s report indicated this “stealth crawling” behavior was observed across tens of thousands of domains and millions of requests per day.

In response, Perplexity sharply rebutted Cloudflare’s accusations on August 5, calling the technical analysis “fundamentally inadequate” and “disqualifying.” Perplexity argues that Cloudflare misunderstood its technology, asserting that its system operates on “user-driven AI agents” that fetch information in real-time for specific user queries, rather than traditional, large-scale web bots. The company stated that when a user asks for current information, the AI goes to relevant websites, reads the content, and provides a summary tailored to the specific question, without storing the data for training. Perplexity also accused Cloudflare of misattributing automated traffic from a third-party service, BrowserBase, to its own systems, claiming it only uses this service occasionally and not for general web scraping.

This dispute highlights a critical distinction and a growing ethical quandary in the AI era: how should AI agents that access websites on a user’s behalf be treated? Cloudflare’s CEO, Matthew Prince, has been vocal about the potential “existential threat” AI models pose to publishers, arguing that AI scraping could damage content creators’ business models by consuming bandwidth without generating referral traffic or revenue. Cloudflare has since removed Perplexity from its “verified bot” program and implemented new measures to block its alleged stealth crawling across its network.

The controversy underscores a broader debate over AI data collection practices, content consent, and intellectual property. While traditional search engines historically sent users back to original sources, AI search engines often summarize content directly, leading to a significant drop in referral traffic for publishers. This forces website owners into a dilemma: block AI crawlers and risk losing visibility, or allow them and potentially subsidize competitors who profit from their content without compensation. This isn’t Perplexity’s first brush with such accusations; the company has faced prior allegations of plagiarism from media outlets like Wired and is currently involved in a lawsuit with Dow Jones and a threatened legal action from the BBC over content scraping.

The ongoing clash between Cloudflare and Perplexity exemplifies the intensifying technical and ethical arms race between AI companies seeking vast datasets and content creators striving to control their digital assets, signaling a crucial moment for defining the future of web interaction and data access norms.