"Poisoning" datasets to fight AI sounds appealing, but it doesn't actually work, says developer Xe Iaso. Her tool, Anubis, takes a different approach: it puts invisible computational hurdles in the way of bot scrapers.
"From what I have learned, poisoning datasets doesn't work. It makes you feel good, but it ends up using more compute than you end up saving. I don't know the polite way to say this, but if you piss in an ocean, the ocean does not turn into piss," says Xe Iaso, creator of the open-source project Anubis, which is designed to protect web servers from AI scrapers. She discussed her approach in a recent interview with 404 Media.
That means a popular tactic for fighting AI models is mostly ineffective: inserting deliberately flawed or harmful data into public content using tools like Glaze or Nightshade to sabotage training. Iaso argues that these methods have little impact on massive AI datasets and end up using far more resources than they save.
Anubis makes automated crawling more expensive
The real challenge, according to Iaso, isn't just technical - it's about an uneven playing field. Big AI companies have the compute to process or filter even heavily poisoned data. For individual artists or developers, it would take enormous effort just to create minor disruptions. Iaso says it's more effective to control access to content at a technical level, which is where Anubis comes in. The tool forces bots to solve cryptographic puzzles in their browser, driving up costs for anyone trying to scrape millions of pages, while real users never notice.
Anubis is designed to selectively raise the price of automated crawling. It acts like an "invisible CAPTCHA": anyone not running JavaScript properly, or not behaving like a real browser, gets blocked. Unlike traditional CAPTCHAs, this approach doesn't create barriers for users, keeping the experience accessible. Anubis is lightweight, open source, and easy to self-host on almost any server. It's already in use by organizations like GNOME, FFmpeg, and UNESCO.