Content
summary Summary

A new method speeds up diffusion models by up to 256 times. This could be a step towards real-time AI image generation.

Diffusion models have outpaced alternative image generation systems like GANs. They generate high-quality, high-resolution images, can modify existing images, and even generate 3D shapes. However, this requires dozens to hundreds of denoising steps, which is compute-intensive and thus time-consuming.

Nevertheless, the speed between prompt input and output of an image is already impressive for generative AI models such as DALL-E 2, Midjourney, or Stable Diffusion: depending on the computing power and AI model, it takes only a few seconds.

To further reduce the computational effort - and possibly enable real-time image generation in the near future - researchers are investigating how to reduce denoising steps.

Ad
Ad

Distilled diffusion dramatically speeds up AI image generation

Researchers from Stanford University, Stability AI, and Google Brain are now showing progress by reducing the denoising steps of models by at least 20-fold.

Building on previous work by the contributing authors, the team uses progressive network distillation. In this process, an AI model learns to reproduce the output of the original large model. It is then gradually reduced to a diffusion model that requires significantly fewer steps to denoise an image.

In network distillation, a large AI model acts as a teacher and a small one as a student. During training, the large AI passes on its knowledge: in the case of a language AI, for example, the 20 most likely words that complete an incomplete sentence. The small AI model thus learns to reproduce the results of the large AI model - without adopting its size.

According to the paper, the Distilled Diffusion model speeds up inference by "at least ten times" compared to existing methods on ImageNet 256x256 and LAION datasets. For smaller images, the speedup is even a factor of 256.

Distilled Diffusion is extremely fast - even on Apple hardware

Relative to standard diffusion models, Distilled Diffusion can produce images at a similarly high level with only four sampling steps. Compared to diffusion models such as Stable Diffusion, which require dozens to hundreds of steps to produce a good image, Distilled Diffusion could even produce "highly realistic images" in as few as one to four denoising steps. Image manipulations such as AI-assisted image processing also work in as few as two to four steps.

Recommendation

Stability AI founder Emad Mostaque is optimistic that this research will soon be applied in practice. Combined with native support for the neural engines in Apple's silicon chips, Stable Diffusion could shorten the image generation process from eight seconds to less than one.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Generative AI models such as Stable Diffusion produce high-quality images, but require dozens to hundreds of denoising steps.
  • The researchers demonstrate a method to generate high-quality images in as few as one to four steps. AI images could thus be generated in less than one second instead of eight seconds.
  • According to Emad Mostaque, founder of Stability AI, this research advance could soon be integrated into products.
Jonathan works as a technology journalist who focuses primarily on how easily AI can already be used today and how it can support daily life.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.