Align Your Gaussians: Nvidia's new AI creates 3D animations from text

Dec 22, 2023 Maximilian Schreiner

Researchers from Nvidia, the University of Toronto and MIT have developed a new AI system that can generate 3D animations from text descriptions.

Align Your Gaussians (AYG) represents 3D shapes as collections of 3D Gaussian functions and models their motion using deformation fields that define how the Gaussians move over time to generate animations. These so-called "3D Gaussians" have emerged in recent months as a possible alternative to the popular NeRFs.

https://the-decoder.de/wp-content/uploads/2023/12/AYG-Nvidia-examples.mp4?_=1

Video: Nvidia

The process combines the strengths of different AI models: the Stable Diffusion text-to-image model ensures a realistic appearance of individual images. A text-to-video model, trained on large video datasets, provides temporal feedback to generate smooth motion. A multi-view 3D model that adapts to 3D shapes ensures that the generated objects remain geometrically consistent from different angles.

By combining these models in a coordinated training process, the team says AYG can optimize both the 3D shape representation and the deformation fields to produce animations with lively motion, realistic textures, and geometric consistency — directly from textual specifications such as "a horse galloping across a meadow".

According to the researchers, AYG can also generalize to some new concepts that were not seen during training.

Team sees applications for creative tools and synthetic data

AYG is also introducing new techniques to extend and link animations over longer time scales than is possible with existing text-to-video models. In one example, the team shows how dogs switch from a walking animation to a barking animation.

https://the-decoder.de/wp-content/uploads/2023/12/AYG_new_dogs.mp4?_=2

Video: Nvidia

The researchers believe that in the future these methods could also be used to generate 4D scenes and simulations of any duration, opening up new applications in creative tools and synthetic data generation. Synthetic data is often used when training data is scarce or to train borderline cases, for example in autonomous driving.

Unlike alternative methods, AYG also allows multiple animated objects to be combined in a single scene. The researchers show what this looks like in a scene with some of their creations around a campfire.

https://the-decoder.de/wp-content/uploads/2023/12/AYG_ground_stage_compressed.mp4?_=3

Video: Nvidia

More information and examples can be found on the Align Your Gaussians project page.

Sources:

Arxiv