Motion Diffusion turns text into lifelike human animations

Motion Diffusion can create natural-looking human animations from various inputs such as text, actions, or existing animations.

So far, 2022 is the year of generative AI systems that create new media from text: DALL-E 2, Midjourney, Imagen, or Stable Diffusion produce photorealistic or artistic images. Make-a-Video and Imagen Video produce short video clips, AudioGen and AudioLM Audio, and CLIP-Mesh and Dreamfusion create 3D models from text.

Now, in a new paper, Tel Aviv University researchers turn their attention to generating human motion. Their Motion Diffusion Model (MDM) can, among other things, generate matching animations based on text.

"The holy grail of computer animation"

Automated generation of natural and expressive motion is the holy grail of computer animation, according to the researchers. The wide variety of possible movements and the ability of humans to perceive even slight flaws as unnatural are the biggest challenges, the researchers say.

A person's gait from A to B does include some repetitive features. But there are countless variations in the exact implementation of movements.

In addition, movements are difficult to describe: A kick, for example, can be a soccer kick or a karate kick.

Diffusion models used in current imaging systems such as DALL-E 2 have demonstrated remarkable generative capabilities and variability, making them a good choice for human motion, the team writes. For MDM, the researchers accordingly relied on a diffusion model and a transformer architecture.

Motion diffusion model is versatile and beats specialized models

The researchers' model is a generic framework that is suitable for various forms of input. In their work, they show examples of text-to-motion, action-to-motion, and completion and manipulation of existing animations.

In a text-to-motion task, MDM generates an animation that corresponds to a text description. Thanks to the diffusion model, the same prompt generates different variants.

Recommendation

AI research

AI agents outperform human teams in hacking competitions

"A person kicks." | Video: Tevet et al.

"a person turns to his right and paces back and forth." | Video: Tevet et al.

In the action-to-motion task, MDM generates animations that match a particular motion class, such as "sitting down" or "walking."

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

(Class) Run | Video: Tevet et al.

In addition, the model can complete or edit motions. The researchers compare their method with inpainting, which allows users to mark parts of an image in DALL-E 2 or Stable Diffusion and change them via text description.

(Blue=Input, Gold=Synthesis) | Video: Tevet et al.

During an edit, individual parts of the body can be selectively animated, while others do not move or retain their original animation.

Upper body editing (lower body is fixed) (Blue=Input, Gold=Synthesis) | Video: Tevet et al.

In benchmarks, MDM is ahead of other generative models for motion, the researchers write. Currently, generating an animation takes about a minute on an Nvidia GeForce RTX 2080 Ti GPU. The training of the model took about three days.

In the future, the team wants to explore ways to control the animations even better and as a result expand the range of applications for the AI system. The code and model for MDM are available on GitHub.

Motion Diffusion turns text into lifelike human animations

"The holy grail of computer animation"

Motion diffusion model is versatile and beats specialized models

AI agents outperform human teams in hacking competitions

Apple's claims about large reasoning models face fresh scrutiny from a new study

François Chollet on the end of scaling, ARC-3 and his path to AGI

Meta tests chatbots with proactive messaging to boost retention

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

Motion Diffusion turns text into lifelike human animations

"The holy grail of computer animation"

Motion diffusion model is versatile and beats specialized models

Share

Bank details