summary Summary

Meta wants to eliminate language barriers with the help of AI-assisted translation. A new system can translate the mainly spoken language Hokkien into English and vice versa in real time.

Meta connects people on social networks and one day, perhaps, in the Metaverse. In the process, the company has identified the language barrier as a hurdle and has been researching how to overcome it for years.

Direct speech-to-speech translation

Now Meta is unveiling the next step in machine translation closer to its grand vision: a Universal Speech Translator. Its new system can translate the low-resource and primarily spoken Taiwanese language Hokkien into English and back in real time. The system translates speech directly into language without having to take a detour using text translation.

About 45 million people speak Hokkien worldwide, according to Meta. | Image: Meta

The challenge, according to Meta, lay particularly in the scarcity of training data. Meta, therefore, used Mandarin as a bridge language, translating spoken Hokkien into Mandarin text and then into spoken English and vice versa. The use of a resource-rich language significantly improved model performance, according to Meta.


Using a language encoder, Meta was additionally able to encode language embeddings in Hokkien into the same semantic space as other languages, where it could then be aligned with spoken and written English. From the texts, Meta in turn generated spoken English, thus obtaining Hokkien and English in parallel. Meta calls this process "speech mining."

For speech-to-speech translation, Meta used speech-to-unit translation (S2UT), which translates a speech input into a sequence of acoustic units via a path developed by Meta. Using UnitY as a two-pass decoding mechanism, the decoder generated text in a related language (Mandarin) in the first pass and creates acoustic units in the second pass.

Image: Meta

According to Meta, the methods first developed for Hokkien can be applied to many other written and unwritten languages. Meta is releasing the system and a large corpus of speech-to-speech translations as open source for the development of other translation systems. A Hokkien demo is available at Hugging Face.

Meta has been researching AI translation for years

The Hokkien model can currently translate only one full sentence at a time. Nevertheless, Meta sees the model as a step towards a future with simultaneous translation between many languages. To achieve this, Meta relies on unsupervised (also called self-supervised) AI training with large amounts of speech and text data combined with speech recognition, text-to-text translation, and text-to-speech synthesis.

Metas Entwicklungsansatz für die KI-Übersetzung vieler Sprachen in Echtzeit setzt auf selbstüberwachtes Lernen.
Meta's development approach for AI translation of many languages in real time relies on self-supervised learning. | Image: Meta

"Our progress in unsupervised learning demonstrates the feasibility of building high-quality speech-to-speech translation models without any human annotations," Meta writes.


For Meta CEO Mark Zuckerberg, universal translation is a "superpower that people have always dreamed of." Meta unveiled an unsupervised trained AI system for back-translation in 2018, M2M-100, a system capable of translating 100 languages, in 2020, and its evolution in 2021, which scored top marks in the WMT2021 benchmark for translation.

In February 2022, Meta introduced the "No Language Left Behind" project for real time universal translations even of rare languages. This was followed in summer 2022 by NLLB-200, a model for translating 200 languages.

All AI translation research at Meta comes together under the umbrella of the Universal Speech Translator project, which is also the grand vision.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • Meta wants to eliminate language barriers for good with its "Universal Speech Translator" project.
  • A new AI translation system shows that this is possible even for rare, mostly spoken languages like Hokkien.
  • The system learns translation skills without requiring humans to first label training data. Meta is releasing the system as open source.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.