Google's MobileDiffusion generates AI images on mobile devices in less than a second

Feb 3, 2024

Google

Google's MobileDiffusion is a fast and efficient way to create images from text on smartphones.

MobileDiffusion is Google's latest development in text-to-image generation. Designed specifically for smartphones, the diffusion model generates high-quality images from text input in less than a second.

With a model size of only 520 million parameters, it is significantly smaller than models with billions of parameters such as Stable Diffusion and SDXL, making it more suitable for use on mobile devices.

The researchers' tests show that MobileDiffusion can generate images with a resolution of 512 x 512 pixels in about half a second on both Android smartphones and iPhones. The output is continuously updated as you type, as Google's demo video shows.

Video: Google

MobileDiffusion consists of three main components: a text encoder, a diffusion network, and an image decoder.

The UNet contains a self-attention layer, a cross-attention layer, and a feed-forward layer, which are crucial for text comprehension in diffusion models.

However, this layered architecture is computationally complex and resource intensive. Google uses a so-called UViT architecture, in which more transformer blocks are placed in a low-dimensional region of the UNet to reduce resource requirements.

In addition, distillation and a Generative Adversarial Network (GAN) hybrid are used for one- to eight-level sampling.

Google's UNet architecture contains more transformers in the middle and skips self-attention layers at higher resolutions. | Image: Google

Google has not yet made the model freely available, nor has it announced any plans to do so. Rather, the research should be considered a step toward the goal of democratizing text-to-image generation on mobile devices.

Google has its own family of smartphones with the Pixel series, where generative AI is becoming an increasingly important topic for both hardware and software.

Image generation is getting faster and faster

Last year, Qualcomm demonstrated that a smartphone could quickly generate images based on Stable Diffusion.

By optimizing Qualcomm's AI stack, the U.S. chipmaker was able to run the image generator on what was then a high-end Android smartphone, a remarkable technological advance at the time in February 2023. However, generating an image with 512 x 512 pixels and 20 inference steps still took about 15 seconds.

The advantage of Google's MobileDiffusion approach is that it delivers fast results on all systems, regardless of the operating system. It's even better on the iPhone 15 Pro than on Samsung's latest flagship, the Galaxy S24, which runs Google's Android.

More recently, SDXL Turbo or PixArt-δ have also made significant advances in quasi-real-time image generation, albeit on more powerful systems.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Google's MobileDiffusion generates AI images on mobile devices in less than a second

Image generation is getting faster and faster

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.