Researchers at Google have figured out how to make AI language models both faster and better by using an unusual approach: letting smaller AI models teach the bigger ones.
A joint team from Google Research and DeepMind has developed a training method called SALT (Small model aided large model training) that cuts training time by up to 28 percent while improving performance. The key innovation? Using smaller language models as assistant teachers.
The process happens in two stages. First, the large model learns from a smaller model through a process called knowledge distillation, where one AI model teaches another by sharing both its answers and how confident it is in those answers. While knowledge distillation usually involves larger models teaching smaller ones, the Google team found it can work the other way around - at least during certain parts of training. In the second stage, the large model switches to conventional training methods.
The smaller model proves especially helpful in areas where it already makes solid predictions. For these simpler tasks, the larger model learns more quickly and reliably, before switching to traditional training for more complex challenges.
SALT could make training AI models more accessible
The team tested SALT by using a 1.5 billion parameter model to train a 2.8 billion parameter model. The results were impressive: the larger model reached its performance targets in just 70 percent of the usual training time, and then went on to score better on various tests.
The improvements really showed up after fine-tuning for specific tasks. For math problems, models trained with SALT reached 34.87 percent accuracy compared to 31.84 percent for traditionally trained models. Reading comprehension scores jumped from 63.7 percent to 67 percent.
The researchers also created an enhanced version called SALTDS that carefully selects training data, focusing on examples where the smaller model performs well.
While SALT can help create more powerful large language models, it might be especially valuable for organizations working with limited resources. Instead of needing access to the biggest AI models, institutions could use SALT to develop capable language models with more modest computing power, the team said.