(Don't) Feed the Robots
"My site [with] modest traffic (~1,000 visitors per day) went from serving ~1GB/hr (to tons of AI crawlers!) to ~50MB/hr in traffic"
Can you tell what was the point where I 1. Manually blocked a few AI crawlers 2. Moved my nameserver to Cloudflare plus turned on AI crawler blocking My site w modest traffic (~1,000 visitors per day) went from serving ~1GB/hr (to tons of AI crawlers!) to ~50MB/hr in traffic
— Gergely Orosz (@gergely.pragmaticengineer.com) 2025-04-01T04:35:30.986Z
BlueSky: https://bsky.app/profile/gergely.pragmaticengineer.com/post/3llq2cli6uk2k
"IQSS Dataverse has been experiencing service instability due to increasingly intense data-scrubbing from LLM companies, search engines and similar outfits. We are working with Harvard Systems to stabilize the service."

"When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them. But while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.
As an added benefit, AI Labyrinth also acts as a next-generation honeypot. No real human would go four links deep into a maze of AI-generated nonsense. Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots, which we add to our list of known bad actors.
...
It is important to us that we don’t generate inaccurate content that contributes to the spread of misinformation on the Internet, so the content we generate is real and related to scientific facts, just not relevant or proprietary to the site being crawled."
Cloudflare