GAIA-1 is a generative AI model for autonomous driving

Jun 18, 2023

Midjourney prompted by THE DECODER

Key Points

Video data is a bottleneck when training AI models for autonomous driving. Synthetic data can help remedy this situation.
The generative AI model GAIA-1 is able to generate plausible traffic videos of several minutes in length from only a few seconds of input, which can be used to train AI models.
It can also generate specific scenes based on text input, such as driving maneuvers with many buses on the road or driving the wrong way.

AI models for autonomous driving have to learn countless traffic situations from videos, both inside and outside the rules of the road. But training material is a bottleneck.

Synthetic data could help alleviate this bottleneck for all manufacturers, even those that don't yet have large fleets in real-world traffic. This is exactly the task that the GAIA-1 generative AI model from Wayve, a British company founded in 2017 that specializes in deep learning techniques for autonomous driving models, is designed to do. GAIA stands for "Generative Artificial Intelligence for Autonomy."

A multimodal "world model" for road traffic

GAIA-1 has been trained on a multimodal corpus of driving data, including video, text, and vehicle inputs. Similar to how language models learn to predict the next likely characters in a string, GAIA-1 learned to predict the next frames in a video sequence.

However, according to Wayve, GAIA-1 is not a "standard generative video model". Rather, it is a "true world model" that "learns to understand and disentangle the important concepts of driving" such as different vehicles and their characteristics, roads, buildings, or traffic lights.

The true marvel of GAIA-1 lies in its ability to manifest the generative rules that underpin the world we inhabit. Through extensive training on a diverse range of driving data, our model synthesises the inherent structure and patterns of the real world, enabling it to generate remarkably realistic and diverse driving scenes.

Wayve
Ad

As evidence for this steep thesis, Wayve cites GAIA-1's ability to generate "long plausible futures" from a few seconds of video input. The further into the future the AI looked, the less important the short input became. The scenes generated later contained no content from the source material.

"This shows that GAIA-1 understands the rules that underpin the world we inhabit," Wayve writes. The simulated driving behavior is realistic, as is the environment of parked and moving cars.

The model is designed to provide many settings for both the moving vehicle and the environment. For example, it can simulate driving situations that are not included in the training data. This would be useful, for example, to simulate dangerous driving situations that could be used to evaluate AI models for autonomous driving. GAIA-1 builds on research on Model-Based Imitation Learning for Urban Driving.

Text-to-Traffic

GAIA-1 can be instructed in natural language to create specific scenes, such as navigating between multiple buses in the video below.

Even if a scene is already running, you can modify it by entering text. In the following video, the prompt "It's night, and we have turned on our headlights" leads to a generated night drive.

Wayve describes its model as "a unique way to better train autonomous systems to more efficiently navigate complex real-world scenarios," and plans to use it to further develop its own AI models for autonomous driving. Wayve plans to release more information about GAIA-1 in the coming months.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Wayve.ai | Paper