Video AI avatars get better at expressing emotions

Apr 27, 2024 Matthias Bastian

AI startup Synthesia has unveiled the fourth generation of its AI avatars, called "Expressive Avatars," which the company says aim to realistically replicate emotions, voice pitch, and body language.

The Expressive Avatars are powered by Synthesia's new "EXPRESS-1" model, designed to make avatar performances more lifelike. According to the company, the avatars can now deliver scripts with the proper intonation, body language, and lip sync, just like a real actor.

Every movement and facial expression is predicted in real time and seamlessly synchronized with the timing, intonation, and emphasis of the spoken language, resulting in a natural and human-like performance, Synthesia says.

The EXPRESS-1 model uses large, pre-trained models as the backbone for the performance of Expressive Avatars, combined with diffusion to "model complex multimodal distributions," the company explains. Synthesia claims its avatars generate unique performances based on the script, unlike competing solutions that rely on predefined dynamics.

Potential applications include presentations, marketing, knowledge transfer and onboarding. Synthesia believes that as Gen Z and Gen Alpha enter the workforce, video will become the standard medium for collaboration and communication in the workplace. Synthesia sees its platform as a possible way to meet the challenge of using more video instead of text.

Synthesia's progress makes it clear that AI-generated people in videos will soon be nearly indistinguishable from real people, just as with AI photos. The company emphasizes that it has implemented additional measures to prevent misuse of its platform, such as policies restricting certain content, investing in early detection of malicious actors, and experimenting with content identification technologies.

London-based Synthesia last raised $90 million in June 2023, achieving a valuation of $1 billion. The startup was founded in 2017 by researchers and entrepreneurs from University College London, Stanford, the Technical University of Munich, and Cambridge.

Sources:

Synthesia