New AI model generates 45-minute lip-synced video from one photo and runs in real time

Apr 13, 2026

Nano Banana Pro prompted by THE DECODER

Key Points

Researchers have introduced LPM 1.0, an AI model that generates real-time video of a speaking, listening, or singing character from just a single image, complete with lip-synced speech, subtle facial expressions like hesitation or gaze shifts, and smooth emotional transitions.
The model plugs directly into voice AI systems like ChatGPT and works across a wide range of visual styles, including photorealistic faces, anime, and 3D game characters.
The entire video generation runs as a real-time streaming process, with the system reportedly staying stable for videos up to 45 minutes long.

Researchers have introduced LPM 1.0, an AI model that generates real-time video of a speaking, listening, or singing figure from a single image.

The model processes text, audio, and reference images simultaneously, producing lip-synchronized speech, subtle facial expressions like hesitation or shifts in gaze, and emotional transitions. It can plug directly into voice-audio AI models from ChatGPT or Doubao to create a visual conversation partner in real time.

LPM 1.0 works across different image styles, photorealistic faces, anime, and 3D game characters, without any additional training. The entire video generation runs as a streaming process in real time rather than rendering a finished video all at once. Videos up to 45 minutes long should remain stable.

LPM 1.0 utilizes what the researchers call "multi-granularity identity conditioning:" alongside a main image, the model also receives reference images from different angles and with varying facial expressions. This means it doesn't have to invent details like teeth, wrinkles tied to specific emotions, or profile views on its own — it can pull them directly from the reference material.

The model recognizes three conversational states. When listening, it generates reactive facial expressions like nodding or gaze shifts based on incoming audio. When speaking, the response audio drives lip movements and body language. During pauses, LPM generates natural idle behavior based on text instructions.

Beyond real-time conversation, LPM 1.0 also supports offline video generation from existing audio, useful for podcasts or movie dialogs, according to project manager Ailing Zeng. This opens the door to content creation outside of live chats. Video-based input control isn't included in this version, but Zeng says the framework could support it in the future.

Still a research project with no public release planned

The development team stresses that LPM 1.0 is purely a research project. There are no plans to release weights, code, or a public demo. All faces shown are AI-generated, not real people. The researchers acknowledge that the generated videos still contain visible artifacts, and a quantitative analysis confirmed a noticeable gap compared to real video quality.

The team also says they'd only consider opening access "if and when adequate safeguards and responsible-use frameworks are firmly in place." More details are available on the project page and in the technical report.

Even as a research project, LPM 1.0 points to where things are heading: AI systems that don't just communicate through text or voice, but show up as visually believable characters with facial expressions, eye contact, and emotional reactions. That could prove valuable for education, gaming, customer service, or virtual companions.

At the same time, the technology carries serious risks. It edges dangerously close to real-time deepfake infrastructure that bad actors could exploit for fraud, manipulation, or impersonation. All of those things are already happening, what keeps shrinking is the barrier to entry. The researchers are explicit that the system is not meant to mislead, deceive, or impersonate real people.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Source: Project page