Ravenous AI Crawlers Are Breaking the Web—and Driving It Behind Paywalls

Read Articleadded Sep 2, 2025
Ravenous AI Crawlers Are Breaking the Web—and Driving It Behind Paywalls

AI data-harvesting bots are hammering websites with aggressive, noncompliant crawls that degrade performance and raise costs without sending traffic back. Even big sites struggle with spikes and must scale up, while small sites can be knocked offline. Defensive measures are emerging, but the author predicts they will lead to a fragmented, paywalled web.

Key Points

  • AI crawlers now drive a significant share of web traffic and are far more aggressive than traditional search bots, often ignoring crawl rules and overloading servers.
  • Traffic spikes from AI bots can hit 10–20x normal levels and even large sites must add resources to avoid user abandonment from slow load times.
  • Publishers receive little or no referral traffic from AI crawlers, unlike classic search indexing, so costs rise without corresponding revenue.
  • Robots.txt is frequently disregarded; defenses like paywalls, CAPTCHAs, and DDoS protections are imperfect, prompting new measures (e.g., llms.txt, Cloudflare blocking/tolls, Anubis throttling).
  • The likely outcome is an arms race that pushes the web toward fragmentation and paywalls, undermining the open web.

Sentiment

Predominantly negative toward AI crawlers and supportive of the article’s thesis; most participants report real harm and view current AI crawling practices as abusive and misaligned with publishers’ interests, with some dissent attributing issues to site inefficiency or hosting economics.

In Agreement

  • AI crawlers are causing Slashdot-effect-like load spikes, hammering dynamic pages, exploding DB load, and degrading UX without sending traffic back.
  • Bots routinely ignore robots.txt, crawl-delay, and bandwidth etiquette, and fan out across massive IP pools (including cloud and residential), making rate limiting and UA blocks insufficient.
  • Caching doesn’t save sites because bots traverse deep, rarely-hit pages and dynamic endpoints, blowing past caches and forcing expensive page renders.
  • The incentives are misaligned: AI companies externalize crawl costs onto publishers, compete on freshness, and have the capital to over-crawl; unlike search engines, they deliver little or no monetizable traffic.
  • Common Crawl and open datasets are positive models because they reduce redundant crawling and democratize access.
  • Small sites are being forced into harsher defenses (logins, paywalls, CAPTCHAs, Cloudflare/Anubis), accelerating centralization and making the web worse for humans and accessibility.
  • Stronger, layered mitigations (ASN/IP intelligence, Spamhaus lists, nginx/front-door filtering, throttling, honeypots) are necessary just to stay online.

Opposed

  • Some blame inefficient stacks (especially WordPress’s DB patterns and dynamic rendering) and poor hosting choices; with better caching, object stores, and more resources, sites should tolerate 1 QPS per crawler.
  • A few argue bandwidth pricing (e.g., Netlify overages) is the bigger problem than bots; switching to cheaper hosts/CDNs would mitigate costs.
  • Centralized solutions like Cloudflare’s Signed Agents or default bot blocking are seen by some as power grabs that further centralize the web and create surveillance/attestation risks.
  • PoW/micropayment gates, cryptographic identity, or bot whitelists are criticized as user-hostile, accessibility-unfriendly, or impractical social/market solutions that won’t stop determined bad actors.
  • Serving dumps or standardized feeds may not help because aggressive crawlers won’t use them, or sites shouldn’t have to redesign around feeding AI at all.
Ravenous AI Crawlers Are Breaking the Web—and Driving It Behind Paywalls