Content
summary Summary

Meta's new AI model "Imagine yourself" can generate a variety of personalized images from a single reference image - without additional training.

Ad

Meta has introduced a new AI model called "Imagine Yourself" that can generate personalized images from a single reference image without requiring additional training. The model can create multiple new images of a person in various poses, styles, and environments based on a single reference image.

Unlike previous approaches that needed to be retrained for each individual, "Imagine Yourself" operates without person-specific training. The model simultaneously processes the reference image and text instruction, allowing it to adapt flexibly to new people and instructions.

Meta relies on synthetic training data

To achieve these advancements, Meta employs several novel techniques. Firstly, "Imagine Yourself" utilizes synthetic training pairs, generating synthetic variants that correspond to real reference images. This enables the model to learn how to portray individuals in different poses and styles without adhering too closely to the reference image.

Ad
Ad

Secondly, the model features a new architecture with three parallel text processing modules and a trainable image processing module. These modules process image and text concurrently, facilitating better coordination between the two. Meta also applies multi-stage fine-tuning, training the model alternately with real and synthetic data to optimize identity preservation and instruction compliance.

Das Bild zeigt die Architektur des
Image: Meta

According to Meta, "Imagine Yourself" outperforms existing approaches like InstantID or IP adapters in executing complex instructions that necessitate significant changes to the reference image. For instance, the model can alter a person's facial expression or head posture and situate them in entirely new environments.

"Imagine yourself" still has weaknesses and is not yet available

However, the study also reveals that competing models occasionally surpass "Imagine Yourself" in terms of identity preservation. Meta attributes this to the fact that these models often simply copy parts of the reference image, potentially resulting in unnatural-looking results.

"Imagine Yourself" can also be extended to generate images featuring multiple people. To accomplish this, the image information from several reference images is processed in parallel, enabling the creation of group photos with known individuals in new poses and environments.

Meta intends to continue researching "Imagine Yourself," with future priorities including the extension to video generation and the improvement of very complex poses such as jumps. The model and code are not yet publicly available.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta has developed a new AI model called "Imagine yourself" that can generate multiple personalized images of a person from a single reference image without the need for additional training.
  • The model uses synthetic training data and a new architecture with parallel text and image processing modules. This allows it to respond flexibly to new people and instructions, and to make complex changes to the reference image.
  • According to Meta, "Imagine yourself" can implement complex instructions better than existing approaches, but still has weaknesses in identity preservation. The company plans to continue its research and extend the model to video generation.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.