Stable Diffusion-based "MagicMix" from Bytedance turns dogs into coffee makers

TikTok company Bytedance demonstrates a new image generation method that can blend two semantic concepts into a new one.

"Semantic mixing" is what the Bytedance research team calls the process of instructing a diffusion model to mix two semantic concepts into a new one: For example, a watermelon and a lamp become a lamp in the shape of a watermelon. A tiger and a rabbit become a rabbit with tiger stripes.

Some examples of semantic concepts mixed with MagicMix. | Image: Bytedance

The researchers describe semantic mixing as a difficult problem because even humans could not always tell exactly what the result of such mixing might be.

Melon lamps and tiger rabbits

Unlike style transfer, in which the image content remains unchanged and only the style is adapted (a rendered graphic becomes a drawn graphic, for example), semantic mixing creates new motifs. The layout and geometry of the original image are preserved, which contributes to greater precision and stability in image generation.

The Bytedance team uses Stable Diffusion 1.4 as the underlying image AI for the MagicMix process. The diffusion model, which extracts images from noise, first generates the layout and shape of an image. Only later in the denoising process does the semantic concept follow.

MagicMix takes advantage of this two-step approach: Bytedance also first determines the rough layout of the subject based on an image or text. Then it switches the prompt to the second semantic concept during generation. No spatial mask or relearning is required.

AI image generation becomes more flexible

The process even works in reverse: MagicMix can remove semantic concepts from images. However, the results are rather bizarre compared to concept mixing. For example, a cat image without a cat concept becomes a strange chameleon. A dog picture without a dog becomes a sphinx-like cat.

A cat without a cat is a giant chameleon..

Mixing animal breeds is interesting. The aforementioned tiger rabbit is a fancy example, but it can also be done in more realistic ways, such as crossing dog breeds. Here we can see that the mixed concepts can reflect reality. Fictional motifs are also possible, which creatives could use as inspiration for their work.

One of MagicMix's strengths, its ability to maintain the layout, is also a weakness: the method fails when it mixes content concepts without similarity in form. Mix a van with a cat, and instead of a cat with tires or a van with whiskers, you get a cat on a van or … well.

Recommendation

AI research

Study shows: 'Test-time compute scaling' is a path to better AI systems

Despite this limitation, the Bytedance team is convinced that MagicMix will create further opportunities and scope for the use of image AI systems.

Thanks to the strong capability in generating novel concepts, our MagicMix supports a large variety of creative applications, including semantic style transfer(e.g., generating a new sign given a reference sign layout and a certain desired content), novel object synthesis (e.g., generating a lamp that looks like a watermelon slice), breed mixing (e.g., generating a new species by mixing “rabbit” and “tiger”) and concept removal (e.g., synthesizing a non-orange object that looks like an orange). Although the solution is simple, it paves a new direction in the computational graphics field and provides new possibilities for AI-aided designs for artists in a wide field, such as entertainment, cinematography, and CG effects.

Excerpt from the paper

Because MagicMix is based on Stable Diffusion, which in turn is trained with a LAION dataset, among others, the Bytedance team points to "social and cultural biases" in image generation.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Stable Diffusion-based "MagicMix" from Bytedance turns dogs into coffee makers

Melon lamps and tiger rabbits

AI image generation becomes more flexible

Study shows: 'Test-time compute scaling' is a path to better AI systems

Alibaba's new GPT-4o competitor Qwen VLo is no longer open source

Studio Ghibli founder Hayao Miyazaki's viral AI criticism lacks crucial context

Google adds native image generation to Gemini language models

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Stable Diffusion-based "MagicMix" from Bytedance turns dogs into coffee makers

Melon lamps and tiger rabbits

AI image generation becomes more flexible

Share

Bank details