Here's how OpenAI's DALL-E 3 could leapfrog the competition

Midjourney prompted THE DECODER

All generative AI models for images currently use diffusion models. OpenAI presents an alternative that is significantly faster and could power new models like DALL-E 3.

DALL-E 2, Stable Diffusion, or Midjourney use diffusion models that gradually synthesize an image from noise during image generation. The same iterative process is used in audio or video models.

While diffusion models produce great results and can be controlled via text prompts, they are comparatively slow and require anywhere from 10 to 2,000 times more computing power than GANs. This hinders their use in real-time applications.

OpenAI is therefore developing a new variant of generative AI models called Consistency Models.

Consistency models are designed to combine the advantages of diffusion models and GANs

According to OpenAI, consistency models support fast, one-step image synthesis, but also allow for few-step sampling, "to trade compute for sample quality". Consistency models can therefore produce useful results even without an iterative process, and could soon be suitable for real-time applications.

OpenAI's consistency models learn from an iterative process to skip it later if needed. | Image: OpenAI

Like diffusion models, they also support zero-shot data editing, e.g. for inpainting, colorization, or super-resolution tasks. Consistency models can either distilled from a pre-trained diffusion model or be trained in isolation. According to OpenAI, consistency models are able to outperform many GANs in singe step generation and all other non-adversarial, single-step generative models.

The consistency model (bottom row) can directly generate an image, the diffusion model only shows noise for now. | Image: OpenAI

The company conducted all testing on relatively small networks and image datasets, for example, training a neural network to synthesize cat images. All models were released by the company as open source for research purposes.

A new generative AI architecture for DALL-E 3 and video synthesis?

According to the authors, there are also striking similarities to other AI techniques used in other fields, such as deep Q-learning from reinforcement learning or momentum-based contrastive learning from semi-supervised learning. "This offers exciting prospects for cross-pollination of ideas and methods among these diverse fields.," the team says.

In the months leading up to the release of DALL-E 2, OpenAI had published several articles on diffusion models and finally presented GLIDE, a very impressive model at the time. So the research on consistency models could be an indication that OpenAI is looking for new and more effective generative AI architectures that could, for example, enable a much faster DALL-E 3 and be used for real-time video generation.

Recommendation

AI research

New Othello experiment supports the world model hypothesis for large language models

OpenAI's current work should therefore be seen as a feasibility study, with a larger AI model being the likely next step. The same architecture could eventually be used for other modalities or for 3D content synthesis.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Here's how OpenAI's DALL-E 3 could leapfrog the competition

Consistency models are designed to combine the advantages of diffusion models and GANs

A new generative AI architecture for DALL-E 3 and video synthesis?

New Othello experiment supports the world model hypothesis for large language models

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Anthropic, OpenAI, Google, and xAI have landed Pentagon contracts worth up to $200 million

Apple weighs abandoning its own AI for Siri as it tests models from OpenAI and Anthropic

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Here's how OpenAI's DALL-E 3 could leapfrog the competition

Consistency models are designed to combine the advantages of diffusion models and GANs

A new generative AI architecture for DALL-E 3 and video synthesis?

Share

Bank details