Motion Diffusion turns text into lifelike human animations

Oct 6, 2022

Tevet et al.

Motion Diffusion can create natural-looking human animations from various inputs such as text, actions, or existing animations.

So far, 2022 is the year of generative AI systems that create new media from text: DALL-E 2, Midjourney, Imagen, or Stable Diffusion produce photorealistic or artistic images. Make-a-Video and Imagen Video produce short video clips, AudioGen and AudioLM Audio, and CLIP-Mesh and Dreamfusion create 3D models from text.

Now, in a new paper, Tel Aviv University researchers turn their attention to generating human motion. Their Motion Diffusion Model (MDM) can, among other things, generate matching animations based on text.

"The holy grail of computer animation"

Automated generation of natural and expressive motion is the holy grail of computer animation, according to the researchers. The wide variety of possible movements and the ability of humans to perceive even slight flaws as unnatural are the biggest challenges, the researchers say.

A person's gait from A to B does include some repetitive features. But there are countless variations in the exact implementation of movements.

In addition, movements are difficult to describe: A kick, for example, can be a soccer kick or a karate kick.

Diffusion models used in current imaging systems such as DALL-E 2 have demonstrated remarkable generative capabilities and variability, making them a good choice for human motion, the team writes. For MDM, the researchers accordingly relied on a diffusion model and a transformer architecture.

Motion diffusion model is versatile and beats specialized models

The researchers' model is a generic framework that is suitable for various forms of input. In their work, they show examples of text-to-motion, action-to-motion, and completion and manipulation of existing animations.

In a text-to-motion task, MDM generates an animation that corresponds to a text description. Thanks to the diffusion model, the same prompt generates different variants.

"A person kicks." | Video: Tevet et al.

"a person turns to his right and paces back and forth." | Video: Tevet et al.

In the action-to-motion task, MDM generates animations that match a particular motion class, such as "sitting down" or "walking."

(Class) Run | Video: Tevet et al.

In addition, the model can complete or edit motions. The researchers compare their method with inpainting, which allows users to mark parts of an image in DALL-E 2 or Stable Diffusion and change them via text description.

(Blue=Input, Gold=Synthesis) | Video: Tevet et al.

During an edit, individual parts of the body can be selectively animated, while others do not move or retain their original animation.

Upper body editing (lower body is fixed) (Blue=Input, Gold=Synthesis) | Video: Tevet et al.

In benchmarks, MDM is ahead of other generative models for motion, the researchers write. Currently, generating an animation takes about a minute on an Nvidia GeForce RTX 2080 Ti GPU. The training of the model took about three days.

In the future, the team wants to explore ways to control the animations even better and as a result expand the range of applications for the AI system. The code and model for MDM are available on GitHub.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

AI news without the hype
Curated by humans.

More than 16% discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Motion Diffusion turns text into lifelike human animations

"The holy grail of computer animation"

Motion diffusion model is versatile and beats specialized models

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.