Content
summary Summary

AI startup Synthesia has unveiled the fourth generation of its AI avatars, called "Expressive Avatars," which the company says aim to realistically replicate emotions, voice pitch, and body language.

The Expressive Avatars are powered by Synthesia's new "EXPRESS-1" model, designed to make avatar performances more lifelike. According to the company, the avatars can now deliver scripts with the proper intonation, body language, and lip sync, just like a real actor.

Every movement and facial expression is predicted in real time and seamlessly synchronized with the timing, intonation, and emphasis of the spoken language, resulting in a natural and human-like performance, Synthesia says.

The EXPRESS-1 model uses large, pre-trained models as the backbone for the performance of Expressive Avatars, combined with diffusion to "model complex multimodal distributions," the company explains. Synthesia claims its avatars generate unique performances based on the script, unlike competing solutions that rely on predefined dynamics.

Ad
Ad

Potential applications include presentations, marketing, knowledge transfer and onboarding. Synthesia believes that as Gen Z and Gen Alpha enter the workforce, video will become the standard medium for collaboration and communication in the workplace. Synthesia sees its platform as a possible way to meet the challenge of using more video instead of text.

Synthesia's progress makes it clear that AI-generated people in videos will soon be nearly indistinguishable from real people, just as with AI photos. The company emphasizes that it has implemented additional measures to prevent misuse of its platform, such as policies restricting certain content, investing in early detection of malicious actors, and experimenting with content identification technologies.

London-based Synthesia last raised $90 million in June 2023, achieving a valuation of $1 billion. The startup was founded in 2017 by researchers and entrepreneurs from University College London, Stanford, the Technical University of Munich, and Cambridge.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Synthesia, an AI start-up for video communication, introduces the fourth generation of its AI avatars, the "Expressive Avatars". Based on the "EXPRESS-1" model, they are designed to realistically reproduce emotions, voice pitch and body language.
  • The avatars use large, pre-trained models and diffusion to create unique representations. Movements and facial expressions are predicted in real time and synchronized with the timing, intonation and emphasis of spoken language.
  • Applications include presentations, marketing, knowledge transfer and onboarding. Synthesia sees its platform as an approach to the challenge of increasing the use of video in corporate communications.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.