summary Summary

To control the behavior of physics-based characters through language, Nvidia's PADL combines a language model with reinforcement learning.


One thing that comes to mind when remembering the start of this wave of AI is certainly the strangely moving 3D figures from Deepmind and other research institutions. These three-legged spiders or humanoid 3D puppets had learned their movements through reinforcement learning.

Digitale Wesen mit verschiedenen Armen und Beinen kriechen durchs Bild.
Via "Deep Evolutionary Reinforcement Learning," Stanford's "Unimals" evolved and learned to walk, among other things. | Image: Stanford University

There are now numerous approaches to making digital animals or human-like figures learn movements on their own. The goal of these methods is to develop AI systems that can generate natural-looking movements for a variety of simulated figures and thus complement or replace manual animation and motion capture processes in the long term.

Nvidia's PADL makes AI animation controllable by language

For AI animation to be used in industrial workflows, it must be controllable. Nvidia now introduces "Physics-based Animation Directed with Language" (PADL), a framework that combines advances in natural language processing with reinforcement learning methods to create a language-driven system.


PADL is trained in three stages: In the skill embedding phase, Nvidia uses a language encoder and a motion encoder to train a shared embedding space with short videos with motions and associated text descriptions.

The embedding space combines language and skills seen in the video and is used in the second phase to learn multiple policies for solving simple tasks, such as moving toward a specific object. In the third phase, Nvidia merges the different learned policies (multitask aggregation).

Nvidia uses an embedding space trained with text descriptions and video clips to train different strategies, which are then combined. | Image: Nvidia

The resulting model can then be controlled by language: Users can use text input to assign a character a specific task and a corresponding skill, such as "sprint to the red block" or "face the target and hit it with the shield."

Characters automatically learn related movements

By training with different movements and corresponding text descriptions, the model can interpolate between related movements, such as slow walking and sprinting.

In the video, you can see the character gradually increasing its speed or slowly squatting from a standing position without having seen the intermediate steps in training. However, the model is overwhelmed with completely new skills, such as cartwheels and unseen tasks.


Nvidia wants to train PADL with a much larger dataset of annotated motion capture recordings for more skills, and abandon the few fixed tasks in favor of a more general approach to policy training.

More information can be found on the PADL project page. The code will be released there soon.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • Nvidia's PADL makes AI-generated animations controllable via language.
  • Users can use text input to assign a specific task to a character, such as "sprint to the red block".
  • The model can also interpolate between related movements, such as slow walking and sprinting.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.