Stable Diffusion XL Turbo generates AI images in real-time

Nov 29, 2023 Matthias Bastian

Stability AI introduces SDXL Turbo, a new text-to-image model capable of generating high-quality AI images in real-time.

SDXL Turbo builds on the foundation of SDXL 1.0 and implements a new distillation technique for text-to-image models: Adversarial Diffusion Distillation (ADD). This technique reduces the number of image generation steps from 50 to a single step, while maintaining a high image quality.

Like other distillation techniques, ADD uses a previously trained large diffusion image model as a teacher network. You can read the SDXL Turbo research paper detailing the new distillation technique of this model here.

By integrating ADD, SDXL Turbo offers many of the advantages of Generative Adversarial Networks (GANs), such as single-step image output, while avoiding artifacts or blurring often seen in other distillation methods, Stability AI writes.

At the same time, it provides higher-quality single-step generation. With just four steps, SDXL Turbo is said to achieve the image quality of SDXL with 50 steps.

SDXL Turbo beats SDXL in just four steps

Stability AI compared several model variants (StyleGAN-T++, OpenMUSE, IF-XL, SDXL, and LCM-XL) by generating images with the same prompt.

Human evaluators were then shown two random outputs and asked to select the output that most closely matched the prompt. Another test was then conducted using the same method for image quality.

In these blind tests, SDXL Turbo outperformed a 4-step configuration of LCM-XL with only one step, and a 50-step configuration of SDXL with only four steps.

The comparison with 50-step SDXL in particular shows that SDXL Turbo can significantly outperform a computationally intensive multi-step model with much lower processing overhead in terms of speed, and even slightly outperform it in terms of image quality.

In addition, SDXL Turbo offers significant improvements in inference speed. On an Nvidia A100, SDXL Turbo generates a 512x512 image in just 207 ms (prompt encoding + a single denoising step + decoding, fp16).

If you want to test a free demo of Stable Diffusion XL Turbo, you can do so on Clipdrop. The demo is not intended for commercial use. If you are interested in commercial use, you can contact Stability AI.

Sources:

Blog, Paper