Google's Re-Imagen could set new quality standards for AI images

Generative AI systems like Stable Diffusion require costly retraining to learn new concepts. Google's Re-Imagen takes a more efficient approach.

OpenAI's DALL-E 2 or Midjourney generate high-quality images based on text. Their generative capabilities, however, are limited to objects or styles that are part of the companies' training data.

A popular alternative is therefore the Open-Source model Stable Diffusion. It runs locally on a user's graphics card or in the cloud and - thanks to some fine-tuning techniques - can learn new concepts such as styles, objects, or people, provided the appropriate hardware resources are available.

Initial attempts used Textual Inversion as a method for post-training, but Dreambooth has now become the standard. The method developed by Google to personalize large text-to-image models like Imagen has been adapted by the open-source community for Stable Diffusion.

Stable Diffusion can be personalized with your own images, but it is a laborious approach

Dreambooth lets you customize Stable Diffusion to your own needs with sample images. The method achieves good results with just a few images. While the hardware requirements were extreme at the beginning, optimized Dreambooth versions can be run with Nvidia graphics cards with 10 gigabytes of VRAM.

Dreambooth is becoming popular for creating custom Stable Diffusion models using your images.

Here is a beginner friendly thread on how it works: ? pic.twitter.com/jlLdOqbWBf

- Divam Gupta (@divamgupta) November 1, 2022

In the Dreambooth paper, Google compares DALL-E 2 and Imagen to show how the method allows a generative AI model to learn, say, a new clock face for an alarm clock.

Dreambooth allows Google's Imagen to learn new concepts. | Image: Google

The results are impressive, and numerous people on Twitter are sharing examples of how extensively Dreambooth can be used. So Dreambooth solves a key problem of generative AI models: they often struggle to generate images of unusual and unfamiliar objects or styles. But the method is not necessarily scalable, as each concept requires computationally intensive training.

Google's Re-Imagen shows a scalable alternative to Dreambooth

A group at Google is now demonstrating the Retrieval-Augmented Text-to-Image Generator (Re-Imagen). This new method allows a generative AI model to generate images of rare or never-before-seen objects.

Recommendation

AI research

Nvidia researcher Jim Fan expects "GPT-3 moment" for robotics in the next few years

As the name (Retrieval-Augmented) implies, Re-Imagen retrieves new information from an external database rather than being re-trained with additional data.

Given a text prompt, Re-Imagen accesses an external multi-modal knowledge base to retrieve relevant (image, text) pairs, and uses them as references to generate the image.

From the paper

Re-Imagen retrieves semantic and visual information about unknown or rare objects via the additional input, improving its accuracy in image generation.

To accomplish this, the Google team trained Re-Imagen with a new dataset that includes three modalities (image, text, and retrieval). Thus, the model learned to use text input and retrievals of the external database for a generation.

In some examples, Google shows that Re-Imagen achieves significantly better results for rare or unknown objects than Imagen, DALL-E 2, or Stable Diffusion.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Google Re-Imagen accesses an external database to better generate images of rare or unseen objects. | Image: Google

However, the new method also has disadvantages:

First, because Re-Imagen is sensitive the to retrieved image-text pairs it is conditioned on, when the retrieved image is of low-quality, there will be a negative influence on the generated image. Second, Re-Imagen sometimes still fail to ground on the retrieved entities when the entity’s visual appearance is out of the generation space. Third, we noticed that the super-resolution model is less effective, and frequently misses low-level texture details of the visual entities.

From the paper

The team plans to investigate these limitations further and address them in future work.

Google's Re-Imagen could set new quality standards for AI images

Stable Diffusion can be personalized with your own images, but it is a laborious approach

Google's Re-Imagen shows a scalable alternative to Dreambooth

Nvidia researcher Jim Fan expects "GPT-3 moment" for robotics in the next few years

Google shows AI filmmaking tool, XR glasses and launches $250 Gemini subscription

Gemini 2.5 Pro: Google has finally caught up

Google is rolling out its AI search to more countries

AI coding can make developers slower even if they feel faster

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

"Cat attack" on reasoning model shows how important context engineering is

Google's Re-Imagen could set new quality standards for AI images

Stable Diffusion can be personalized with your own images, but it is a laborious approach

Google's Re-Imagen shows a scalable alternative to Dreambooth

Share

Bank details