Google finds new way to train AI models using smaller 'teacher' models

Midjourney prompted by THE DECODER

Researchers at Google have figured out how to make AI language models both faster and better by using an unusual approach: letting smaller AI models teach the bigger ones.

A joint team from Google Research and DeepMind has developed a training method called SALT (Small model aided large model training) that cuts training time by up to 28 percent while improving performance. The key innovation? Using smaller language models as assistant teachers.

The process happens in two stages. First, the large model learns from a smaller model through a process called knowledge distillation, where one AI model teaches another by sharing both its answers and how confident it is in those answers. While knowledge distillation usually involves larger models teaching smaller ones, the Google team found it can work the other way around - at least during certain parts of training. In the second stage, the large model switches to conventional training methods.

The smaller model proves especially helpful in areas where it already makes solid predictions. For these simpler tasks, the larger model learns more quickly and reliably, before switching to traditional training for more complex challenges.

SALT could make training AI models more accessible

The team tested SALT by using a 1.5 billion parameter model to train a 2.8 billion parameter model. The results were impressive: the larger model reached its performance targets in just 70 percent of the usual training time, and then went on to score better on various tests.

The improvements really showed up after fine-tuning for specific tasks. For math problems, models trained with SALT reached 34.87 percent accuracy compared to 31.84 percent for traditionally trained models. Reading comprehension scores jumped from 63.7 percent to 67 percent.

The researchers also created an enhanced version called SALTDS that carefully selects training data, focusing on examples where the smaller model performs well.

While SALT can help create more powerful large language models, it might be especially valuable for organizations working with limited resources. Instead of needing access to the biggest AI models, institutions could use SALT to develop capable language models with more modest computing power, the team said.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Google finds new way to train AI models using smaller 'teacher' models

SALT could make training AI models more accessible

Microsoft's rStar-Math enables small language models to ace complicated math problems

AI startup Prime Intellect trains first distributed LLM across three continents

AI researcher says that ChatGPT's "secret ingredient" may be holding back LLM capabilities

Corporate AI agents use simple workflows with human oversight instead of chasing full autonomy

Physicist Steve Hsu publishes research built around a core idea generated by GPT-5

The ARC benchmark's fall marks another casualty of relentless AI optimization

Google finds new way to train AI models using smaller 'teacher' models

SALT could make training AI models more accessible

Microsoft's rStar-Math enables small language models to ace complicated math problems

AI startup Prime Intellect trains first distributed LLM across three continents

AI researcher says that ChatGPT's "secret ingredient" may be holding back LLM capabilities