Currently, Stable Diffusion is just a powerful AI image generator. Long-term plans go far beyond that.
Generative AI has been clearly on the rise lately: From text to image to text to HD video or text to 3D - AI systems can create more and more media formats, some of them fully automated. New models appear almost every week and are constantly being improved.
Moreover, generative AI tools are making it increasingly easy to digitize the real world. Relatively simple applications for PCs and smartphones use NeRF technology to generate a volumetric 3D scene from individual photos of an object or a room.
Building on current trends, one could argue that generative artificial intelligence will be a powerful driver for increasing digitization. It can significantly boost the quantity and quality of digital content. The ultimate tool would be a single model for creating many types of media that can be handled by professionals and non-professionals alike using natural language.
Stable Diffusion lead thinks an AI-generated Holodeck is doable in a few years
The comments made by Emad Mostaque, CEO of Stability AI, in a Reddit AMA should be seen in the context of the above thesis. Stability AI is the startup behind the open-source image AI Stable Diffusion mentioned at the beginning of this article.
Mostaque cites an experience similar to the Oasis from the VR science fiction film Ready Player One or the famous Holodeck from Star Trek as the goal for the company's generative AI models.
This AI system should continue to be open source so that anyone can "create anything they can imagine." This, he said, requires "full multimodality" in AI models, i.e. generative AI systems that are trained with many contents and data formats.
Mostaque says Stabililty AI is already in talks with game studios and other companies that have access to 3D data for data capture. "Yes, we'll be doing something like the Holodeck in a few years," Mostaque says when asked about generative AI for VR and gaming.
Midjourney CEO David Holz voiced similar thoughts not long ago. He expects AI-generated real-time video games to emerge in ten years. Recently, a developer gave a taste of how Stable Diffusion might be implemented in VR worlds.
Mostaque hints at better models and possible copyright solution for Stable Diffusion
In the near future, Mostaque announced more significant improvements for Stable Diffusion. Stability AI is currently training models with billions of parameters, which will then be optimized.
"You can think of this like bulking and cutting as we then optimise them. I personally expect models to run on the edge in future at way above MJ v4 or DALLE 2 quality. Future being next year or two," Mostaque says.
The CEO also addresses criticism of the current model, which uses copyrighted data for AI training. This allows it to generate images in the style of renowned artists, if their names are included in the prompt. It works the same way with the competitors DALL-E 2 and Midjourney.
"We are working on fully licensed datasets plus opt-out mechanisms for future model development that we do and support. We will make some announcements about this soon. It should be noted that these models are unlikely to 'mature' for the next year so will get upgraded regularly," Mostaque says.
According to Mostaque, Stability AI is also in discussions with governments about open-source datasets and models, and is working on international AI education initiatives.