With Stable Diffusion Version 2.0, Stability AI strives to be more legally compliant and future-proof. Two important changes have been made.
First, Stability AI has removed NSFW images from training datasets to limit their generation. According to Emad Mostaque, founder of Stability AI, this is not about censoring NSFW images per se, but about protecting against child abuse.
"You can't have children and NSFW content in an open model," Mostaque writes on Discord. "So get rid of the kids or get rid of the NSFW."
This decision is drawing criticism in parts of the community, which after OpenAI (DALL-E, GPT-3) and Midjourney now also confronts Stability AI with accusations of censorship and restriction of artistic freedom.
The community-driven NSFW model "Unstable Diffusion," for example, wants to break away from the basic Stable Diffusion model and is planning a Kickstarter campaign for AI models without restrictions.
The limiting rules of companies like Stability AI, OpenAI, and Midjourney prevent these AI systems from becoming useful tools. An artist’s brush is not blocked from drawing anything, nor should the new tools that are becoming integral to the workflow of the next generation of artists.
Moderator of Unstable Diffusion Discord
Mostaque counters the critics that, first, Stable Diffusion can be fine-tuned with NSFW content (see below). Second, no critic could comprehensibly explain why it would make sense to publish a model that includes both NSFW content and images of children.
Artists were not selectively removed from the data sets
Among the criticisms of V2 is the suggestion that Stability AI restricted the prompt based on artist names or removed their works from the dataset for the new Stable Diffusion version.
After the release, users noticed that popular prompt extensions such as "in the style of Greg Rutkowski" no longer provided the desired result, which was a generated image that appeared in the style of game artist Greg Rutkowski. Rutkowski was one of the artists who publicly spoke out against the mass copying of individual styles by AI and is frequently referenced in the generative AI art scene.
However, Mostaque clarifies that the prompts for artists were not intentionally restricted or their works removed from the data. Rather, the switch from OpenAI's CLIP model to LAION's Open CLIP Vit-H14 changed the results, he says.
CLIP models compute embeddings of images and text and compare their similarity, thus guiding image generation and being largely responsible for the result. In the case of OpenAI's CLIP model, the underlying data set was unknown, writes Mostaque.
"OpenAI had loads of celebrities and artists, LAION does not. So if you want them, you'd need to fine-tune back in," Mostaque writes. Users also had to get used to the changed prompting with v2.
By moving to the LAION dataset, Mostaque hopes to have better control and more transparency for future optimizations of Stable Diffusion, especially for fine-tuning. In addition, Stability AI could use this approach to offer an opt-out option to artists in the future.
AI models for the open-source community
According to Mostaque, it's difficult to remove trained content from a model. It's much easier to add it, Mostaque says, explaining the strategic roadmap for further Stable Diffusion releases.
He metaphorically describes Stable Diffusion as a pizza base that the open-source community can topple to their liking through Dreambooth fine-tuning.
The explosion of DreamBooth models has showed how powerful a good base is with just a few images. You will see this being pushed with hundreds, thousands & millions of images to produce really interesting models that we can then smush together to create anything folk can imagine.
— Emad (@EMostaque) November 25, 2022
Mostaque promises "leaps and bounds" of improvements for the Stable Diffusion base model in the coming months. The goal, he says, is to provide the community with ever-better foundations for generative AI. Stability AI will offer its own models for commercial services with licensed content.
Stability AI is also looking at generative AI for 3D content. Mostaque recently revealed an open-source holodeck as the long-term vision for his startup.