Content
summary Summary

OpenAI introduces a new method that dramatically simplifies and accelerates the training of AI image models.

Ad

OpenAI has introduced a new method called sCM (simplified, stabilized and scaled Consistency Models) that improves the training of image generation models. The technique builds on Consistency Models (CMs), a class of diffusion-based generative models that OpenAI has been researching to optimize fast image sampling.

The new sCM method makes training these models more stable and scalable. According to OpenAI, the new models can generate high-quality images in just two computation steps, while previous methods required significantly more steps. OpenAI reports that their largest sCM model, with 1.5 billion parameters, achieves an image generation time of only 0.11 seconds per image on an A100 GPU without special optimizations. This represents a 50-fold speed increase compared to conventional diffusion models.

3x3 Bildergalerie: Schlange, Steinformation, Kakadu, Autospiegel, Schneeleopard, Flusslandschaft, Käfer, Lionfish, Keramik-Teeservice.
Results after two steps. Image: OpenAI

Technical breakthrough in image generation

OpenAI says the new method solves a fundamental problem: Previous Consistency Models worked with discrete time steps, which required additional parameters and was error-prone. The researchers developed a simplified theoretical framework that unifies various approaches. This allowed them to identify and fix the main causes of training instabilities.

Ad
Ad

The results are significant: In tests, the system achieved FID scores of 2.06 on the CIFAR-10 dataset and 1.88 on ImageNet with 512x512 pixel images using just two computation steps. By these metrics, the quality of the generated images is only about ten percent behind the best existing diffusion models.

Scaling to record size possible

The new method also excels at scaling. OpenAI successfully trained models with up to 1.5 billion parameters on the ImageNet dataset - an unprecedented size for this type of model. The researchers observed that image quality consistently improves as model size increases.

This suggests the method could work for even larger models. It's an important development for the future of AI image generation - and potentially for video, audio, and 3D models as well.

More details and examples are available in OpenAI's blog post.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI has developed a new method called sCM that makes AI image generation models faster and simpler to train, enabling high-quality image creation in just two computation steps instead of many more steps previously required.
  • The largest sCM model, with 1.5 billion parameters, generates images in 0.11 seconds on an A100 GPU, which is 50 times faster than traditional diffusion models. In tests, it achieved FID scores of 2.06 on CIFAR-10 and 1.88 on ImageNet with 512x512 pixel images.
  • The method fixes stability issues in previous Consistency Models by using a simplified theoretical framework. OpenAI successfully scaled the approach to 1.5 billion parameters on ImageNet, with image quality improving as model size increases.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.