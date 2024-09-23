Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.

Content Summary

Meta's new AI model "Imagine yourself" can generate a variety of personalized images from a single reference image - without additional training.

Meta has introduced a new AI model called "Imagine Yourself" that can generate personalized images from a single reference image without requiring additional training. The model can create multiple new images of a person in various poses, styles, and environments based on a single reference image.

Unlike previous approaches that needed to be retrained for each individual, "Imagine Yourself" operates without person-specific training. The model simultaneously processes the reference image and text instruction, allowing it to adapt flexibly to new people and instructions.

Meta relies on synthetic training data

To achieve these advancements, Meta employs several novel techniques. Firstly, "Imagine Yourself" utilizes synthetic training pairs, generating synthetic variants that correspond to real reference images. This enables the model to learn how to portray individuals in different poses and styles without adhering too closely to the reference image.

Secondly, the model features a new architecture with three parallel text processing modules and a trainable image processing module. These modules process image and text concurrently, facilitating better coordination between the two. Meta also applies multi-stage fine-tuning, training the model alternately with real and synthetic data to optimize identity preservation and instruction compliance.

According to Meta, "Imagine Yourself" outperforms existing approaches like InstantID or IP adapters in executing complex instructions that necessitate significant changes to the reference image. For instance, the model can alter a person's facial expression or head posture and situate them in entirely new environments.

"Imagine yourself" still has weaknesses and is not yet available

However, the study also reveals that competing models occasionally surpass "Imagine Yourself" in terms of identity preservation. Meta attributes this to the fact that these models often simply copy parts of the reference image, potentially resulting in unnatural-looking results.

"Imagine Yourself" can also be extended to generate images featuring multiple people. To accomplish this, the image information from several reference images is processed in parallel, enabling the creation of group photos with known individuals in new poses and environments.

Meta intends to continue researching "Imagine Yourself," with future priorities including the extension to video generation and the improvement of very complex poses such as jumps. The model and code are not yet publicly available.

