Microsoft Translator: Better translations with new AI technology

Microsoft's new Translator achieves better results when translating into numerous languages. The basis for this is an AI architecture that lets neural networks process tasks more specifically while computing more efficiently.

The so-called "Mixture-of-Experts" (MoE) architecture involves replacing a single neural transformer network with a series of so-called expert networks. The model then decides which task is delegated to which expert network. An expert network could stand for individual languages, for example.

The AI architecture is also built in breadth instead of depth, which allows more parameters with fewer layers. The goal of the MoE architecture is to create better results with less computational effort.

Translator: up to 15 percent better translations

Microsoft is now using the MoE approach for its own Translator service, improving the program's performance across the board. English to Slovak translations benefit the most, with about 15 percent better results, as do English to Bosnian and Bulgarian, each with just under 12 percent. Microsoft evaluated the translations in blind tests with humans.

The MoE architecture also provides better results for Microsoft's translation AI. | Image: Microsoft

Microsoft also trained the MoE network to be "sparse" according to current practice. Neural networks trained in this way activate only those elements that are currently needed when processing a task. In conventionally trained AI models, the entire model is active for each task, which requires more energy. Microsoft compares this to heating a house via individual radiators in rooms instead of via a central furnace.

Microsoft also used so-called transfer learning in the AI training, which recognizes common linguistic elements of different languages so that language comprehension tasks can be transferred from one language to another. Rarely spoken languages, for which little original training material is available, benefit particularly from this.

MoE architecture gains acceptance

The new Translator model is part of Microsoft's "Project Z-Code" for multimodal AI models that combine data such as text, vision, audio and speech. The goal, according to Microsoft, is AI models that can speak, see, hear and understand. The models trained as part of Project Z are to be based on the MoE approach.

"This architecture allows massive scale in the number of model parameters while keeping the amount of compute constant," Microsoft writes.

In the Translator example, it used to take 20 different models to translate between ten languages using English as the bridge language. The new Z-code production model can now directly translate all ten languages into and out of English. Larger Z-code research models can translate up to 101 languages without the English bridge language, according to Microsoft. This results in 10,000 translation paths.

Recommendation

AI in practice

OpenAI unveils o3, its most advanced reasoning model yet

The MoE approach is not new. Google has been researching it since the summer of 2020 and unveiled a giant AI language model constructed using the MoE principle in December 2021. Before that, Meta shared a powerful translation AI in November 2021, in which individual sections in the neural network take on the expert role for different languages. Google's translation AI M4, Google's 1.6 trillion-parameter Switch Transformer and China's 1.75 trillion-parameter Wu Dao 2.0 also rely on MoE architectures.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Microsoft Translator: Better translations with new AI technology

Translator: up to 15 percent better translations

MoE architecture gains acceptance

OpenAI unveils o3, its most advanced reasoning model yet

More about speech AI:

OpenAI DALL-E 2 Prompt Guide: How to use the generative AI model

AI significantly improves early detection of sepsis in hospitals

AI artwork wins art competition and artists are upset

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Microsoft Translator: Better translations with new AI technology

Translator: up to 15 percent better translations

MoE architecture gains acceptance

More about speech AI:

Share

Bank details