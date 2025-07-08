AI in practice
Maximilian Schreiner

A developer focused on stopping AI bots says poisoning datasets is like peeing in the ocean

Sora prompted by THE DECODER
A developer focused on stopping AI bots says poisoning datasets is like peeing in the ocean
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Profile
E-Mail
Content
summary Summary

"Poisoning" datasets to fight AI sounds appealing, but it doesn't actually work, says developer Xe Iaso. Her tool, Anubis, takes a different approach: it puts invisible computational hurdles in the way of bot scrapers.

Ad

"From what I have learned, poisoning datasets doesn't work. It makes you feel good, but it ends up using more compute than you end up saving. I don't know the polite way to say this, but if you piss in an ocean, the ocean does not turn into piss," says Xe Iaso, creator of the open-source project Anubis, which is designed to protect web servers from AI scrapers. She discussed her approach in a recent interview with 404 Media.

That means a popular tactic for fighting AI models is mostly ineffective: inserting deliberately flawed or harmful data into public content using tools like Glaze or Nightshade to sabotage training. Iaso argues that these methods have little impact on massive AI datasets and end up using far more resources than they save.

Anubis makes automated crawling more expensive

The real challenge, according to Iaso, isn't just technical - it's about an uneven playing field. Big AI companies have the compute to process or filter even heavily poisoned data. For individual artists or developers, it would take enormous effort just to create minor disruptions. Iaso says it's more effective to control access to content at a technical level, which is where Anubis comes in. The tool forces bots to solve cryptographic puzzles in their browser, driving up costs for anyone trying to scrape millions of pages, while real users never notice.

Ad
Ad

Anubis is designed to selectively raise the price of automated crawling. It acts like an "invisible CAPTCHA": anyone not running JavaScript properly, or not behaving like a real browser, gets blocked. Unlike traditional CAPTCHAs, this approach doesn't create barriers for users, keeping the experience accessible. Anubis is lightweight, open source, and easy to self-host on almost any server. It's already in use by organizations like GNOME, FFmpeg, and UNESCO.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Developer Xe Iaso argues that "poisoning" public datasets to disrupt AI training is largely ineffective, as big AI companies can easily filter out or process flawed data, making the tactic resource-intensive and ultimately futile.
  • Iaso's open-source tool, Anubis, takes a different approach by blocking automated crawlers with invisible cryptographic puzzles, increasing the cost of large-scale scraping without affecting legitimate users.
  • Anubis is already in use by organizations such as GNOME, FFmpeg, and UNESCO, offering a lightweight, self-hostable way for anyone to protect their web content from unwanted AI scraping.
Sources
404 Media
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Profile
E-Mail
AI research

Researchers reveal that AI models have distinct strategic fingerprints in classic game theory tests

News, tests and reports about VR, AR and MIXED Reality.
What happens next with MIXED My personal farewell to MIXED Meta and Anduril are now jointly developing XR headsets for the US military MIXED-NEWS.com
AI research

Sakana AI's new algorithm lets large language models work together to solve complex problems

AI and society
Comment

The Maquet machine: how AI is reviving Alexandre Dumas' successful model

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

A developer focused on stopping AI bots says poisoning datasets is like peeing in the ocean

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI research

"Cat attack" on reasoning model shows how important context engineering is

AI research

Apple's claims about large reasoning models face fresh scrutiny from a new study

AI in practice

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

Google News