Microsoft's new Translator achieves better results when translating into numerous languages. The basis for this is an AI architecture that lets neural networks process tasks more specifically while computing more efficiently.
The so-called "Mixture-of-Experts" (MoE) architecture involves replacing a single neural transformer network with a series of so-called expert networks. The model then decides which task is delegated to which expert network. An expert network could stand for individual languages, for example.
The AI architecture is also built in breadth instead of depth, which allows more parameters with fewer layers. The goal of the MoE architecture is to create better results with less computational effort.
Translator: up to 15 percent better translations
Microsoft is now using the MoE approach for its own Translator service, improving the program's performance across the board. English to Slovak translations benefit the most, with about 15 percent better results, as do English to Bosnian and Bulgarian, each with just under 12 percent. Microsoft evaluated the translations in blind tests with humans.
Microsoft also trained the MoE network to be "sparse" according to current practice. Neural networks trained in this way activate only those elements that are currently needed when processing a task. In conventionally trained AI models, the entire model is active for each task, which requires more energy. Microsoft compares this to heating a house via individual radiators in rooms instead of via a central furnace.
Microsoft also used so-called transfer learning in the AI training, which recognizes common linguistic elements of different languages so that language comprehension tasks can be transferred from one language to another. Rarely spoken languages, for which little original training material is available, benefit particularly from this.
MoE architecture gains acceptance
The new Translator model is part of Microsoft's "Project Z-Code" for multimodal AI models that combine data such as text, vision, audio and speech. The goal, according to Microsoft, is AI models that can speak, see, hear and understand. The models trained as part of Project Z are to be based on the MoE approach.
"This architecture allows massive scale in the number of model parameters while keeping the amount of compute constant," Microsoft writes.
In the Translator example, it used to take 20 different models to translate between ten languages using English as the bridge language. The new Z-code production model can now directly translate all ten languages into and out of English. Larger Z-code research models can translate up to 101 languages without the English bridge language, according to Microsoft. This results in 10,000 translation paths.
The MoE approach is not new. Google has been researching it since the summer of 2020 and unveiled a giant AI language model constructed using the MoE principle in December 2021. Before that, Meta shared a powerful translation AI in November 2021, in which individual sections in the neural network take on the expert role for different languages. Google's translation AI M4, Google's 1.6 trillion-parameter Switch Transformer and China's 1.75 trillion-parameter Wu Dao 2.0 also rely on MoE architectures.