Cloudflare calls out Perplexity for allegedly ignoring no-crawl directives—raising the stakes for AI ethics and publisher protections.
Cloudflare accuses Perplexity of evading no‑crawl rules—publishers, wake up or get played.
Per‑crawl credibility goes missing
Let’s cut to the chase: Perplexity is facing scrutiny over how it accesses website content. According to Cloudflare, the company may be bypassing robots.txt bans and web application firewall (WAF) blocks by altering user agents and rotating through various ASNs.
Here’s how Cloudflare tested the boundaries…and found them lacking
- They created brand‑new, unindexed domains. Completely invisible to the public and search engines.
- They applied total “do not crawl” instructions via robots.txt—and even WAF rules aimed squarely at PerplexityBot and Perplexity‑User.
- Despite these, Cloudflare reports that Perplexity still returned detailed content from those sites.
Cloudflare claims that when blocked, Perplexity’s traffic appeared to impersonate Chrome on macOS, rotated IPs outside its known range, and used different ASNs—generating millions of requests across tens of thousands of domains daily.
The moral baseline that’s now optional
Ethical bot operation used to be a given:
- 🔹 Be transparent—with user‑agents, IP ranges, even Web Bot Auth.
- 🔹 Respect robots.txt. Period.
- 🔹 Don’t drown sites with heavy scraping.
- 🔹 Have dedicated bots for different tasks so site owners can manage access granularity.
Cloudflare contrasts Perplexity’s behavior with OpenAI, which they say follows those rules—crawling transparently and respecting redlights.
legal scrutiny and friction with publishers intensifies
Legal experts, including Cornell’s James Grimmelmann, note the uncertainty around robots.txt enforcement in court. He points out that while there’s loose consensus on scraping allowed content, “Perplexity seems determined to fuck around and find out whether the reverse is true.”
- Dow Jones sued Perplexity in 2024 for what it described as “massive copying” of content.
- The BBC threatened legal action unless Perplexity compensated or deleted scraped data.
Publishers need to tighten defenses now
Cloudflare has already:
- De‑listed Perplexity as a verified bot.
- Released managed rules to block the stealth crawler—even for free users.
- Enabled over 2.5M sites to deny AI access with “Content Independence Day” tools.
The implications for ecommerce operators
Whether this is a misunderstanding or a calculated risk, the implications are real. Publishers are left scrambling—not because rules changed, but because enforcement mechanisms are being tested. If you’re running an online business, this isn’t about “tech ethics.” It’s about protecting your content, your SEO, and your bottom line.
So what do you do?
- Enforce robots.txt and WAF rules like your content depends on it—because it does.
- Use bot‑management tools that adapt to deception, not just user‑agent strings.
The Weekly Rundown for Ecommerce Insiders