Cloudflare says Perplexity is bypassing crawl blocks with stealth tactics

Cloudflare accuses Perplexity of evading no‑crawl rules—publishers, wake up or get played.

Per‑crawl credibility goes missing

Let’s cut to the chase: Perplexity is facing scrutiny over how it accesses website content. According to Cloudflare, the company may be bypassing robots.txt bans and web application firewall (WAF) blocks by altering user agents and rotating through various ASNs.

Here’s how Cloudflare tested the boundaries…and found them lacking

They created brand‑new, unindexed domains. Completely invisible to the public and search engines.
They applied total “do not crawl” instructions via robots.txt—and even WAF rules aimed squarely at PerplexityBot and Perplexity‑User.
Despite these, Cloudflare reports that Perplexity still returned detailed content from those sites.

Cloudflare claims that when blocked, Perplexity’s traffic appeared to impersonate Chrome on macOS, rotated IPs outside its known range, and used different ASNs—generating millions of requests across tens of thousands of domains daily.

The moral baseline that’s now optional

Ethical bot operation used to be a given:

🔹 Be transparent—with user‑agents, IP ranges, even Web Bot Auth.
🔹 Respect robots.txt. Period.
🔹 Don’t drown sites with heavy scraping.
🔹 Have dedicated bots for different tasks so site owners can manage access granularity.

Cloudflare contrasts Perplexity’s behavior with OpenAI, which they say follows those rules—crawling transparently and respecting redlights.

legal scrutiny and friction with publishers intensifies

Legal experts, including Cornell’s James Grimmelmann, note the uncertainty around robots.txt enforcement in court. He points out that while there’s loose consensus on scraping allowed content, “Perplexity seems determined to fuck around and find out whether the reverse is true.”

Dow Jones sued Perplexity in 2024 for what it described as “massive copying” of content.
The BBC threatened legal action unless Perplexity compensated or deleted scraped data.

Publishers need to tighten defenses now

Cloudflare has already:

De‑listed Perplexity as a verified bot.
Released managed rules to block the stealth crawler—even for free users.
Enabled over 2.5M sites to deny AI access with “Content Independence Day” tools.

The implications for ecommerce operators

Whether this is a misunderstanding or a calculated risk, the implications are real. Publishers are left scrambling—not because rules changed, but because enforcement mechanisms are being tested. If you’re running an online business, this isn’t about “tech ethics.” It’s about protecting your content, your SEO, and your bottom line.

So what do you do?

Enforce robots.txt and WAF rules like your content depends on it—because it does.
Use bot‑management tools that adapt to deception, not just user‑agent strings.

The Weekly Rundown for Ecommerce Insiders

Categories

ABOUT US & CONTACT

Per‑crawl credibility goes missing

Here’s how Cloudflare tested the boundaries…and found them lacking

The moral baseline that’s now optional

legal scrutiny and friction with publishers intensifies

Publishers need to tighten defenses now

The implications for ecommerce operators

Leave a Reply Cancel reply

Related News

Recent Posts

Connect with Us

Per‑crawl credibility goes missing

Here’s how Cloudflare tested the boundaries…and found them lacking

The moral baseline that’s now optional

legal scrutiny and friction with publishers intensifies

Publishers need to tighten defenses now

The implications for ecommerce operators

Leave a Reply Cancel reply

Related News

How Home Depot sped up its supply chain and what comes next

Lululemon’s global push starts as its North American core sputters