Content
summary Summary

A team of researchers from NYU, MIT, and Google has found a way to improve AI-generated images by borrowing ideas from recent AI reasoning models like OpenAI's o1.

Ad

Their approach enhances image quality during the generation process itself, building on how diffusion models already improve images through denoising steps. In their paper "Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps," the researchers introduce two key components: verifiers that act as quality checkers, and search algorithms that use these quality ratings to find better images.

What makes this approach interesting is that it improves results without retraining the AI model - instead, it optimizes the generation process itself, similar to how models like OpenAI's o1, Google's Gemini 2.0 Flash Thinking, and DeepSeek's R1 refine their output while generating text.

Three different search algorithms in the test

The system uses several types of verifiers to evaluate different aspects of each generated image. These include an "Aesthetic Score" for visual quality, a "CLIPScore" that checks how well the image matches the text prompt, and "ImageReward," which evaluates images based on human-like criteria. The researchers combined these evaluators into a "verifier ensemble" to consider multiple quality factors at once.

Ad
Ad

The team developed three search algorithms: Random Search generates multiple versions and picks the best one, though too many attempts can lead to overly similar images. Zero-Order Search starts with a random image and systematically looks for improvements nearby. Search over Paths, the most sophisticated approach, optimizes the entire generation process by improving at varying denoising steps along the way.

Inference time scaling shows significantly better results

Testing showed all three methods significantly improved image quality - even smaller models with this optimization outperformed larger models without it. However, there's a trade-off: better images require more computing time. The researchers found that about 50 extra computing steps per image strikes a good balance between quality and speed.

Collage verschiedener KI-generierter Bildserien: Leuchttürme, Sanduhren, Saxophonist, Teddybären und Katzen-Pressekonferenz in Entwicklungsstufen.
The image series shows the difference between increased computing power for denoising vs. increased computing power for the combination of verifier and search. The quality and adherence to prompts often improve significantly when search is added. | Image: Google Deepmind

Different verifiers showed distinct preferences: the Aesthetic Score tends to produce more artistic images, while CLIPScore favors realistic ones that closely match the text prompt. This means users need to choose their verifier based on the kind of results they're looking for.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers from NYU, MIT and Google Deepmind have developed a method that optimizes AI-generated images as they are generated - without having to retrain the model. They use verifiers and search algorithms to do this.
  • The team has tested three search methods: Random Search generates multiple image versions and selects the best one, Zero-Order Search systematically optimizes in the neighbourhood of a starting image, and Search over Paths improves the entire generation process.
  • The tests showed that all three methods can significantly improve image quality. With about 50 additional computational steps per image, a good compromise between quality improvement and speed was achieved. Even smaller models were able to achieve better results with this technique than larger models without optimization.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.