AI startup HeyGen has unveiled a tool that can translate videos up to five minutes long into different languages. The software also clones the voice of the person in the video and adjusts lip movements accordingly.
In a test by Twitter user Jon Finger, the translation worked flawlessly despite the background noise of a busy street. In the edited video, the distracting sounds are filtered out.
The synthetic voice in a different language is very reminiscent of the original, although it still sounds slightly robotic or mechanical. It is also remarkable how faithfully the intonation is reproduced.
The fact that the lip movements have been changed by the AI is barely visible to the naked eye. HeyGen makes another adjustment to the video track, though, because the face appears much brighter after the translation than in the original clip.
Testing out @HeyGen_Official translation on French and German. I don’t speak either language so let me know if it sounds natural if you do.
I hope if you pay you can turn off the color correction.
It didn’t work on my phone so I had to upload on my pc.https://t.co/FMJp9sJEBI pic.twitter.com/iF5eONAQ3c— Jon Finger (@mrjonfinger) September 11, 2023
As a native speaker, I can confirm that the German translation is somewhat monotonous, but definitely authentic. If I didn't know it was an AI translation, I might find the intonation unusual, but not unnatural.
The official demo video features popular tech YouTuber Marques Brownlee with a Spanish voice and Apple CEO Tim Cook with an Indian voice.
Beta: Ten languages input, eight languages output
The "Video Translate" tool within the experimental offerings of HeyGen Labs is currently in an open beta phase. At launch, it supports English, Spanish, French, Chinese, German, Italian, Portuguese, Dutch, Hindi, and Japanese as input languages. However, it can only translate into English, Spanish, French, Hindi, Italian, German, Polish and Portuguese. "Many more" languages will be added in the coming weeks.
In addition to the technical requirements, such as a minimum length of 30 seconds, MP4, Quicktime and Webm file types, and a resolution between 480 x 480 pixels and 1920 x 1920 pixels, HeyGen provides other tips to get the best possible result.
For example, only one person's face should be visible from one angle in each video scene. If there is background noise or music in the video, clean translation is more difficult.
Get two minutes free, then pay from $29 per month
HeyGen - formerly known as Movio.la - offers users two free credits after signing up, which can be used to edit or create up to two minutes of video.
An additional 15 credits are available starting at $29 per month and can be used across all of HeyGen's browser-based software offerings. In addition to the video translation feature, HeyGen's primary focus has been on virtual AI avatars.
The idea of synchronizing voice and lips is not new, but it is causing a stir, especially in the film industry, and existential fears among voice actors. HeyGen is making this technology available not only to big movie studios with huge budgets, but also to smaller companies and home users.
In 2021, for example, an Israeli start-up made headlines with such a service. However, no major production has yet been released with AI dubbing. However, "Top Gun: Maverick" used AI to bring back Val Kilmer's voice, so it's fair to say that the technology is making its way into mainstream movies.