Content
summary Summary

A new AI system called TANGO can generate realistic videos of people gesturing and moving to match any audio recording. This technology could make it even harder to spot fake videos online.

Ad

TANGO works in three main steps. First, it analyzes reference videos to create a "motion graph" of possible body positions. The motion graph represents possible transitions between different postures or body positions. It's created by analyzing reference videos of a person moving.

Next, it selects appropriate movement sequences to match a target audio clip. Finally, an AI model generates transitional frames to create smooth motion.

The researchers say TANGO's key innovation is using "hierarchical audio motion embedding." This allows it to capture both short-term and long-term connections between speech and gestures, resulting in more natural-looking movements.

Ad
Ad
Collage: Various people gesticulate in front of colorful backgrounds, TANGO logo placed in the middle.
TANGO generates gesture video in three steps: creation of a motion graph, audio-based path selection and interpolation of discontinuous transitions. This method enables the synthesis of realistic gesture video segments to match audio input. | Image: Liu et al.

In tests, TANGO outperformed existing methods on both objective metrics and in user studies. The system could potentially be used in film production or for virtual avatars but, of course, also for easier and more convincing deepfakes.

Fake video seems unstoppable these days

As AI-generated videos become increasingly realistic, it's getting harder for people to verify what's real online. Trusting reputable sources may become more important than trying to authenticate every video. The sheer volume of potential fakes makes catching them all nearly impossible.

TANGO shows how advanced synthetic media creation has become. Users should be very skeptical of supposedly authentic videos from unverified sources.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers have developed an AI system called TANGO that can generate realistic videos of people gesturing and moving to match any audio recording, potentially making it even harder to spot fake videos online.
  • TANGO works by analyzing reference videos to create a "motion graph" of possible body positions, selecting appropriate movement sequences to match a target audio clip, and using an AI model to generate transitional frames for smooth motion.
  • While TANGO could have applications in film production or virtual avatars, it also raises concerns about the increasing difficulty of verifying the authenticity of videos online, making it more important for users to rely on reputable sources and be skeptical of unverified content.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.