Perplexity Accused of Stealth AI Web Crawling Tactics by Cloudflare

Arstechnica

Internet infrastructure giant Cloudflare has accused AI search engine Perplexity of employing “stealth tactics” to bypass website owners’ explicit instructions against web crawling, according to a report published on August 4, 2025. These allegations, detailed in a Cloudflare blog post, claim that Perplexity is using undeclared crawlers that mimic ordinary browser traffic to scrape content from sites that have blocked its official bots via robots.txt files.

Cloudflare, which manages a significant portion of the web’s traffic, stated that it detected these covert operations by monitoring unusual patterns in user agents and IP addresses. Perplexity’s declared crawlers, such as “PerplexityBot,” are often blocked by websites. In response, Cloudflare alleges that Perplexity has pivoted to more surreptitious methods, including rotating IP addresses across various providers and altering user agents to appear as standard Chrome browsers on macOS, effectively disguising automated scraping as human visits. Cloudflare’s CEO, Matthew Prince, likened Perplexity’s behavior to that of “North Korean hackers,” emphasizing the breach of trust in internet etiquette.

This is not the first time Perplexity has faced such accusations. Earlier reports from Wired and Forbes also alleged similar scraping practices despite explicit blocks. The controversy highlights a growing tension between AI companies, which require vast amounts of data for their models, and publishers seeking to protect their intellectual property and control how their content is used. The issue of consent-based scraping has escalated into legal challenges, with the BBC issuing a cease-and-desist letter to Perplexity in June 2025, demanding the deletion of scraped content and compensation. Dow Jones has also initiated lawsuits over similar concerns.

The “robots.txt” file is a long-standing web standard designed to communicate website owners’ preferences for how web crawlers should interact with their sites. While not legally binding, it has been widely considered an ethical guideline for web crawling. Cloudflare argues that Perplexity’s alleged actions violate these established web crawling norms.

In response to these findings and growing concerns, Cloudflare has delisted Perplexity as a “verified bot” and implemented new managed rules to automatically block this stealth crawling activity. Cloudflare is also moving towards a “Pay per Crawl” initiative, which will block AI crawlers by default for new sites on its network unless explicit permission is granted and potentially allow content owners to monetize access to their data for AI training. This shift aims to give publishers more control and establish a more transparent economic model for AI data acquisition.

Perplexity, however, has denied Cloudflare’s claims, with a spokesperson stating that “no content was actually accessed” and suggesting the traffic in question did not originate from their systems. Nevertheless, this ongoing dispute underscores the complex ethical and legal landscape emerging as AI technologies continue to evolve and reshape how information is accessed and utilized online.

Perplexity Accused of Stealth AI Web Crawling Tactics by Cloudflare - OmegaNext AI News