Content
summary Summary

Researchers at Google have figured out how to make AI language models both faster and better by using an unusual approach: letting smaller AI models teach the bigger ones.

Ad

A joint team from Google Research and DeepMind has developed a training method called SALT (Small model aided large model training) that cuts training time by up to 28 percent while improving performance. The key innovation? Using smaller language models as assistant teachers.

The process happens in two stages. First, the large model learns from a smaller model through a process called knowledge distillation, where one AI model teaches another by sharing both its answers and how confident it is in those answers. While knowledge distillation usually involves larger models teaching smaller ones, the Google team found it can work the other way around - at least during certain parts of training. In the second stage, the large model switches to conventional training methods.

The smaller model proves especially helpful in areas where it already makes solid predictions. For these simpler tasks, the larger model learns more quickly and reliably, before switching to traditional training for more complex challenges.

Ad
Ad

SALT could make training AI models more accessible

The team tested SALT by using a 1.5 billion parameter model to train a 2.8 billion parameter model. The results were impressive: the larger model reached its performance targets in just 70 percent of the usual training time, and then went on to score better on various tests.

The improvements really showed up after fine-tuning for specific tasks. For math problems, models trained with SALT reached 34.87 percent accuracy compared to 31.84 percent for traditionally trained models. Reading comprehension scores jumped from 63.7 percent to 67 percent.

The researchers also created an enhanced version called SALTDS that carefully selects training data, focusing on examples where the smaller model performs well.

While SALT can help create more powerful large language models, it might be especially valuable for organizations working with limited resources. Instead of needing access to the biggest AI models, institutions could use SALT to develop capable language models with more modest computing power, the team said.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google researchers have developed a new method called SALT that speeds up the training of large language models by up to 28 percent, while improving their performance by using smaller AI models as assistant teachers.
  • The method works in two stages: First, the large model learns from the smaller model through knowledge distillation, with the smaller model helping in areas where it can already make good predictions. The large model is then trained conventionally.
  • In tests, a 2.8 billion-parameter model trained with SALT achieved the same performance as a conventionally trained model in just 70 percent of the usual training time, and even outperformed it after further fine-tuning, particularly in arithmetic and text comprehension.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.