Meta gets one step closer to its universal language translator

Meta wants to eliminate language barriers with the help of AI-assisted translation. A new system can translate the mainly spoken language Hokkien into English and vice versa in real time.

Meta connects people on social networks and one day, perhaps, in the Metaverse. In the process, the company has identified the language barrier as a hurdle and has been researching how to overcome it for years.

Direct speech-to-speech translation

Now Meta is unveiling the next step in machine translation closer to its grand vision: a Universal Speech Translator. Its new system can translate the low-resource and primarily spoken Taiwanese language Hokkien into English and back in real time. The system translates speech directly into language without having to take a detour using text translation.

About 45 million people speak Hokkien worldwide, according to Meta. | Image: Meta

The challenge, according to Meta, lay particularly in the scarcity of training data. Meta, therefore, used Mandarin as a bridge language, translating spoken Hokkien into Mandarin text and then into spoken English and vice versa. The use of a resource-rich language significantly improved model performance, according to Meta.

Using a language encoder, Meta was additionally able to encode language embeddings in Hokkien into the same semantic space as other languages, where it could then be aligned with spoken and written English. From the texts, Meta in turn generated spoken English, thus obtaining Hokkien and English in parallel. Meta calls this process "speech mining."

For speech-to-speech translation, Meta used speech-to-unit translation (S2UT), which translates a speech input into a sequence of acoustic units via a path developed by Meta. Using UnitY as a two-pass decoding mechanism, the decoder generated text in a related language (Mandarin) in the first pass and creates acoustic units in the second pass.

According to Meta, the methods first developed for Hokkien can be applied to many other written and unwritten languages. Meta is releasing the system and a large corpus of speech-to-speech translations as open source for the development of other translation systems. A Hokkien demo is available at Hugging Face.

(1/3) Until now, AI translation has focused mainly on written languages. Universal Speech Translator (UST) is the 1st AI-powered speech-to-speech translation system for a primarily oral language, translating Hokkien, one of many primarily spoken languages. https://t.co/onYKQ8uoKN pic.twitter.com/Iy8MRMOypQ

— Meta AI (@MetaAI) October 19, 2022

Meta has been researching AI translation for years

The Hokkien model can currently translate only one full sentence at a time. Nevertheless, Meta sees the model as a step towards a future with simultaneous translation between many languages. To achieve this, Meta relies on unsupervised (also called self-supervised) AI training with large amounts of speech and text data combined with speech recognition, text-to-text translation, and text-to-speech synthesis.

Metas Entwicklungsansatz für die KI-Übersetzung vieler Sprachen in Echtzeit setzt auf selbstüberwachtes Lernen. — Meta's development approach for AI translation of many languages in real time relies on self-supervised learning. | Image: Meta

"Our progress in unsupervised learning demonstrates the feasibility of building high-quality speech-to-speech translation models without any human annotations," Meta writes.

Recommendation

AI research

Nvidia researcher Jim Fan expects "GPT-3 moment" for robotics in the next few years

For Meta CEO Mark Zuckerberg, universal translation is a "superpower that people have always dreamed of." Meta unveiled an unsupervised trained AI system for back-translation in 2018, M2M-100, a system capable of translating 100 languages, in 2020, and its evolution in 2021, which scored top marks in the WMT2021 benchmark for translation.

In February 2022, Meta introduced the "No Language Left Behind" project for real time universal translations even of rare languages. This was followed in summer 2022 by NLLB-200, a model for translating 200 languages.

All AI translation research at Meta comes together under the umbrella of the Universal Speech Translator project, which is also the grand vision.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Meta gets one step closer to its universal language translator

Direct speech-to-speech translation

Meta has been researching AI translation for years

Nvidia researcher Jim Fan expects "GPT-3 moment" for robotics in the next few years

Google DeepMind open-sources AI text watermarking for Gemini

Microsoft's RUBICON tells if your AI coding buddy is actually helping or just slacking off

Language models like GPT-4 memorize more than they reason, study finds

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

Meta gets one step closer to its universal language translator

Direct speech-to-speech translation

Meta has been researching AI translation for years

Share

Bank details