Artists removed 80 million images from the training data for Stable Diffusion 3. But the copyright problem of large AI image models is far from solved.
There is a flip side to the success of AI image models: From the beginning, companies like Stability AI have been criticized for using artists' work to train Stable Diffusion without their consent. The parties involved are now fighting in the courts as well.
The AI startup Spawning has taken it upon itself to bring a bit more transparency to the data used to train AI. As a first step, it released a search engine called haveibeentrained.com. It allows people to search the training data for image models. This has led to users being able to discover sensitive data about themselves in the data set.
The platform also allows artists to remove their images from datasets used to train AI models. In December 2022, Spawning announced that Stability AI would consider this so-called artist opt-out when training Stable Diffusion 3. The deadline for the opt-out passed on March 3rd.
Now, Spawning announces that artists have opted out of releasing 80 million artworks to train AI models. Spawning considers this to be a success.
"This establishes a significant precedent towards realizing our vision of consenting AI, and we are just getting started!" the organization announced on Twitter.
However, this is only a drop in the bucket, or about three percent, compared to the more than two billion images in the LAION dataset used by Stable Diffusion.
Opt-outs via ArtStation and Shutterstock
To make the opt-out work, Spawning relied on partnerships with platforms like ArtStation and Shutterstock. These platforms also allowed artists to opt out of AI training, or the images were excluded by default. Over 40,000 opt-out requests were submitted directly through haveibeentrained.com.
Each copyright claim was manually reviewed by Spawning, the organization said. Tools to remove multiple images from individual artists at once are in the works, but were not ready for Stable Diffusion 3. The platform also allows artists to explicitly share their work for AI training if it is not already in the dataset.
We provided these services for free.
We believe that consenting data will be of great benefit to both AI organizations and the people who these systems are trained on.
We recently launched human verified artist opt-in, and have more tools and partnerships in the works.
Artists are not universally in favor of the opt-out process. Some feel that a general opt-in process is better. This means that images must be proactively made available for AI training. Registering on a website and providing more data just to remove work from a dataset it was never intended for doesn't seem like the best way to handle AI consent.
Large gesture, small effect
Overall, the opt-out option is a respectful signal from Stability AI and a building block for the future development of AI models. The styles of some artists may no longer be natively reproducible with Stable Diffusion 3.
But whether AI models will stop using copyrighted material for image generation in the future, even with opt-in procedures, is questionable. After all, the new ControlNet method makes it increasingly easy for users to tune Stable Diffusion with their own images. The responsibility for copyright violations will probably shift from companies to individuals.
The fact that plagiarism of certain styles is easier to detect may explain why the debate about AI training permissions is more heated for image generators than for language models. But publishers and authors also want to keep their work away from language models.