Bytedance's DreamActor-M1 lets users control AI video faces and bodies with uncanny precision

GPT-Image-1 prompted by THE DECODER

Bytedance has unveiled DreamActor-M1, a new AI system that gives users precise control over facial expressions and body movements in generated videos.

The system uses what the company calls "hybrid guidance" - a combination of multiple control signals working together. DreamActor-M1's architecture has three main components. At its core is a facial encoder that can modify expressions independently from a person's identity or head position. According to Bytedance researchers, this solves a common limitation in previous systems.

The demo shows facial expressions and audio from one video being mapped onto both an animated character and a real person. | Video: Bytedance

The system manages head movements through a 3D model using colored spheres to direct gaze and head orientation. For body motion, it employs a 3D skeleton system with an adaptive layer that adjusts for different body types to create more natural movement.

Detailliertes Schaubild des DreamActor-M1-Systems. Links sind Videoframes mit einer tanzenden Person zu sehen, die als Eingabe dienen. Im mittleren Bereich werden drei parallele Verarbeitungspfade dargestellt: Pose-Estimation (oben), Face-Tracker (Mitte) und Face-Alignment (unten). Diese werden in verschiedene Latent-Darstellungen kodiert und durch Diffusions-Transformer-Blöcke (DiT) verarbeitet. Rechts ist die Architektur eines DiT-Blocks mit den Aufmerksamkeitsmechanismen Self-Attention, Reference-Attention und Face-Attention dargestellt — The system processes body movements and facial expressions separately before combining them in a diffusion transformer to create more lifelike animations. | Image: Bytedance

During the training phase, the model learns from images at various angles. The researchers say this allows it to generate new viewpoints even from a single portrait, filling in missing details like clothing and pose intelligently.

Übersichtsdiagramm: Pipeline zur Inferenz generativer KI für Videosynthese animierter Menschen aus Steuersignalen und Referenzen. — DreamActor-M1 creates multiple views from one reference image, processes facial and body movements separately, then combines them to produce the final animated video. | Image: Bytedance

Training happens in three stages: first the model works on basic body and head movement, then it adds precisely controlled facial expressions, and finally it optimizes everything together for more coordinated results. Bytedance says the model was trained on 500 hours of video, with equal parts full-body and upper-body footage.

According to the researchers, DreamActor-M1 outperforms similar systems in both visual quality and motion control precision, including commercial products like Runway Act-One.

Video: Bytedance

The system does have limitations. It cannot handle dynamic camera movements, object interactions, or extreme differences in body proportions between source and target. Complex scene transitions also remain challenging.

Recommendation

AI research

DeepMind's Genie 2 generates playable 3D worlds from single images

Bytedance, which owns TikTok, is developing several AI avatar animation projects simultaneously. Earlier this year, the company launched OmniHuman-1, which is already available as a lip-sync tool on CapCut's Dreamina platform, showing how quickly Bytedance can bring research to users. Other ongoing projects include the Goku video AI series and InfiniteYou portrait generator.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Bytedance's DreamActor-M1 lets users control AI video faces and bodies with uncanny precision

DeepMind's Genie 2 generates playable 3D worlds from single images

Goku models from ByteDance can generate realistic product videos without human actors

ByteDance's Jimeng AI lets users create short AI-generated videos tailor-made for TikTok

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

Bytedance's DreamActor-M1 lets users control AI video faces and bodies with uncanny precision

DeepMind's Genie 2 generates playable 3D worlds from single images

Goku models from ByteDance can generate realistic product videos without human actors

ByteDance's Jimeng AI lets users create short AI-generated videos tailor-made for TikTok