A new AI system called TANGO can generate realistic videos of people gesturing and moving to match any audio recording. This technology could make it even harder to spot fake videos online.
TANGO works in three main steps. First, it analyzes reference videos to create a "motion graph" of possible body positions. The motion graph represents possible transitions between different postures or body positions. It's created by analyzing reference videos of a person moving.
Next, it selects appropriate movement sequences to match a target audio clip. Finally, an AI model generates transitional frames to create smooth motion.
The researchers say TANGO's key innovation is using "hierarchical audio motion embedding." This allows it to capture both short-term and long-term connections between speech and gestures, resulting in more natural-looking movements.
In tests, TANGO outperformed existing methods on both objective metrics and in user studies. The system could potentially be used in film production or for virtual avatars but, of course, also for easier and more convincing deepfakes.
Fake video seems unstoppable these days
As AI-generated videos become increasingly realistic, it's getting harder for people to verify what's real online. Trusting reputable sources may become more important than trying to authenticate every video. The sheer volume of potential fakes makes catching them all nearly impossible.
TANGO shows how advanced synthetic media creation has become. Users should be very skeptical of supposedly authentic videos from unverified sources.