Content
summary Summary

"Poisoning" datasets to fight AI sounds appealing, but it doesn't actually work, says developer Xe Iaso. Her tool, Anubis, takes a different approach: it puts invisible computational hurdles in the way of bot scrapers.

Ad

"From what I have learned, poisoning datasets doesn't work. It makes you feel good, but it ends up using more compute than you end up saving. I don't know the polite way to say this, but if you piss in an ocean, the ocean does not turn into piss," says Xe Iaso, creator of the open-source project Anubis, which is designed to protect web servers from AI scrapers. She discussed her approach in a recent interview with 404 Media.

That means a popular tactic for fighting AI models is mostly ineffective: inserting deliberately flawed or harmful data into public content using tools like Glaze or Nightshade to sabotage training. Iaso argues that these methods have little impact on massive AI datasets and end up using far more resources than they save.

Anubis makes automated crawling more expensive

The real challenge, according to Iaso, isn't just technical - it's about an uneven playing field. Big AI companies have the compute to process or filter even heavily poisoned data. For individual artists or developers, it would take enormous effort just to create minor disruptions. Iaso says it's more effective to control access to content at a technical level, which is where Anubis comes in. The tool forces bots to solve cryptographic puzzles in their browser, driving up costs for anyone trying to scrape millions of pages, while real users never notice.

Ad
Ad

Anubis is designed to selectively raise the price of automated crawling. It acts like an "invisible CAPTCHA": anyone not running JavaScript properly, or not behaving like a real browser, gets blocked. Unlike traditional CAPTCHAs, this approach doesn't create barriers for users, keeping the experience accessible. Anubis is lightweight, open source, and easy to self-host on almost any server. It's already in use by organizations like GNOME, FFmpeg, and UNESCO.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Developer Xe Iaso argues that "poisoning" public datasets to disrupt AI training is largely ineffective, as big AI companies can easily filter out or process flawed data, making the tactic resource-intensive and ultimately futile.
  • Iaso's open-source tool, Anubis, takes a different approach by blocking automated crawlers with invisible cryptographic puzzles, increasing the cost of large-scale scraping without affecting legitimate users.
  • Anubis is already in use by organizations such as GNOME, FFmpeg, and UNESCO, offering a lightweight, self-hostable way for anyone to protect their web content from unwanted AI scraping.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.