Meta's new image AI makes AI graphics more controllable

Meta demonstrates an AI system designed to give people more control over their AI art. Systems like this could also be a tool to help build the Metaverse.

AI-generated images have been making their rounds on the web for years. While a few years ago it was primarily GAN-based systems that generated deceptively real cats or people, today it is multimodal models that allow targeted creations via text input. Users of DALL-E 2, Midjourney and Craiyon (formerly DALL-E mini) are flooding Twitter, Instagram and other channels with AI images.

However, concrete control over these systems is often quite limited: Enter text and wait for images - that's it. With some systems, variants of interesting results can be generated. The creativity of DALL-E can also be directed specifically to individual places in an existing image to change this section. Then, for example, a flamingo appears in what is actually an empty pool.

Meta experiments with text plus sketch

In a new paper, Meta researchers now demonstrate a multimodal AI system that enables more targeted image generation. Users can feed "Make-A-Scene" a sketch and then start the generation process by entering additional text.

While with other AI systems the results are often difficult to predict, Make-A-Scene allows people to target their ideas more precisely, Meta writes.

Video: Meta

This requires defining the basic scene layout in the sketch. The text input then fills the sketch with AI-generated graphics. The model can also create its own layouts via text input - however, users lose some control in this way.

Meta sees Make-A-Scene as a step toward more targeted AI creation

According to Meta, some artists have been given test access to Make-A-Scene. One of the developers has tested the system with his children, for example, to make monster bears ride on trains. There are no plans to release it for now: For Meta, Make-A-Scene is an experiment in AI creativity that focuses on user control.

A monster robot bear riding a train. | Image: Meta

To harness the potential of AI to foster creative expression, humans must be able to shape and control the content generated by the system, Meta writes. To do that, a suitable system must be intuitive and easy to use - using speech, text, sketches, gestures or even eye movements.

Recommendation

AI in practice

Nvidia positions GR00T N1 to dominate robotics ecosystem

Thanks to AI support, Meta hopes to create a new class of creative tools that will allow many people to create expressive messages in 2D, XR and virtual worlds.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Meta's new image AI makes AI graphics more controllable

Meta experiments with text plus sketch

Meta sees Make-A-Scene as a step toward more targeted AI creation

Nvidia positions GR00T N1 to dominate robotics ecosystem

OpenAI DALL-E 2 Prompt Guide: How to use the generative AI model

AI significantly improves early detection of sepsis in hospitals

AI artwork wins art competition and artists are upset

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Meta's new image AI makes AI graphics more controllable

Meta experiments with text plus sketch

Meta sees Make-A-Scene as a step toward more targeted AI creation

Share

Bank details