Content
summary Summary

Researchers from Nvidia, the University of Toronto and MIT have developed a new AI system that can generate 3D animations from text descriptions.

Align Your Gaussians (AYG) represents 3D shapes as collections of 3D Gaussian functions and models their motion using deformation fields that define how the Gaussians move over time to generate animations. These so-called "3D Gaussians" have emerged in recent months as a possible alternative to the popular NeRFs.

Video: Nvidia

The process combines the strengths of different AI models: the Stable Diffusion text-to-image model ensures a realistic appearance of individual images. A text-to-video model, trained on large video datasets, provides temporal feedback to generate smooth motion. A multi-view 3D model that adapts to 3D shapes ensures that the generated objects remain geometrically consistent from different angles.

Ad
Ad

By combining these models in a coordinated training process, the team says AYG can optimize both the 3D shape representation and the deformation fields to produce animations with lively motion, realistic textures, and geometric consistency — directly from textual specifications such as "a horse galloping across a meadow".

According to the researchers, AYG can also generalize to some new concepts that were not seen during training.

Team sees applications for creative tools and synthetic data

AYG is also introducing new techniques to extend and link animations over longer time scales than is possible with existing text-to-video models. In one example, the team shows how dogs switch from a walking animation to a barking animation.

Video: Nvidia

The researchers believe that in the future these methods could also be used to generate 4D scenes and simulations of any duration, opening up new applications in creative tools and synthetic data generation. Synthetic data is often used when training data is scarce or to train borderline cases, for example in autonomous driving.

Recommendation

Unlike alternative methods, AYG also allows multiple animated objects to be combined in a single scene. The researchers show what this looks like in a scene with some of their creations around a campfire.

Video: Nvidia

More information and examples can be found on the Align Your Gaussians project page.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers from Nvidia, the University of Toronto, and MIT have developed an AI system called Align Your Gaussians (AYG) that generates 3D animations from text descriptions.
  • AYG combines different AI models to create animations with vivid motion, realistic textures, and geometric consistency based on textual input such as "a horse galloping across a meadow".
  • The researchers see future applications of AYG in creative tools and the generation of synthetic data used, for example, in autonomous driving.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.