A new method speeds up diffusion models by up to 256 times. This could be a step towards real-time AI image generation.
Diffusion models have outpaced alternative image generation systems like GANs. They generate high-quality, high-resolution images, can modify existing images, and even generate 3D shapes. However, this requires dozens to hundreds of denoising steps, which is compute-intensive and thus time-consuming.
Nevertheless, the speed between prompt input and output of an image is already impressive for generative AI models such as DALL-E 2, Midjourney, or Stable Diffusion: depending on the computing power and AI model, it takes only a few seconds.
To further reduce the computational effort - and possibly enable real-time image generation in the near future - researchers are investigating how to reduce denoising steps.
Distilled diffusion dramatically speeds up AI image generation
Researchers from Stanford University, Stability AI, and Google Brain are now showing progress by reducing the denoising steps of models by at least 20-fold.
Building on previous work by the contributing authors, the team uses progressive network distillation. In this process, an AI model learns to reproduce the output of the original large model. It is then gradually reduced to a diffusion model that requires significantly fewer steps to denoise an image.
In network distillation, a large AI model acts as a teacher and a small one as a student. During training, the large AI passes on its knowledge: in the case of a language AI, for example, the 20 most likely words that complete an incomplete sentence. The small AI model thus learns to reproduce the results of the large AI model - without adopting its size.
Distilled #StableDiffusion2
> 20x speed up, convergence in 1-4 steps
We already reduced time to gen 50 steps from 5.6s to 0.9s working with @nvidia
Paper drops shortly, will link, model soon
Will be presented @NeurIPS by @chenlin_meng & @robrombach
Interesting eh 🙃 https://t.co/DQJwAaeRBA pic.twitter.com/eQdqsKGSEW
— Emad (@EMostaque) December 1, 2022
According to the paper, the Distilled Diffusion model speeds up inference by "at least ten times" compared to existing methods on ImageNet 256x256 and LAION datasets. For smaller images, the speedup is even a factor of 256.
Distilled Diffusion is extremely fast - even on Apple hardware
Relative to standard diffusion models, Distilled Diffusion can produce images at a similarly high level with only four sampling steps. Compared to diffusion models such as Stable Diffusion, which require dozens to hundreds of steps to produce a good image, Distilled Diffusion could even produce "highly realistic images" in as few as one to four denoising steps. Image manipulations such as AI-assisted image processing also work in as few as two to four steps.
Delighted to have native support for the AI neural engines for Stable Diffusion from @Apple, one of the 1st optimised models. 8s on MacBook Air M2, will be < 1s with distilled #StableDiffusion2
AI for all. Can't wait to see what everyone creates.
— Emad (@EMostaque) December 1, 2022
Stability AI founder Emad Mostaque is optimistic that this research will soon be applied in practice. Combined with native support for the neural engines in Apple's silicon chips, Stable Diffusion could shorten the image generation process from eight seconds to less than one.