Why an artificial intelligence must go to school

The big advances in AI development come from more data, bigger networks, more computing power. Does AI have a future outside the cloud?

The trend in language AIs is toward larger and larger models that swallow more and more data. These giants include Google's BERT language AI, OpenAI's GPT-2, Facebook's RoBERTa and Nvidia's Megatron. The latter is the most recent language AI, 24 times larger than BERT-large and five times larger than GPT-2.

But large AI models require a lot of computing power: Nvidia recommends 512 graphics chips for training Megatron. And those who want to retrain OpenAI's GPT-2 will bring even fast server graphics cards to their limit with the latest version.

Since language models require a lot of computing power and energy, they only run in the cloud. Smaller models can be run locally without an Internet connection on the smartphone or on a robot - but they perform significantly less.

So smaller AI models are needed that can still match the performance of their giant relatives. But how can this be done?

Back to School (of AI)

The solution could be a kind of AI school in which the small AIs learn from the big ones.

AI researchers call this process distillation: a large AI model acts as a teacher, and a small one as a student. During training, the large AI passes on its knowledge: in the case of a language AI, for example, the 20 most likely words that complete an incomplete sentence. The small AI model thus learns to reproduce the results of the large AI model - without adopting its scale.

Google researchers have now applied this method to the language AI BERT. The result: BERT's student DistilBERT is 60 percent smaller, faster, and achieves 95 percent of BERT's performance. In the future, DistilBERT will be shrunk using other methods, such as removing some network connections.

Google has similarly shrunk its own next-gen Assistant, which is expected to run locally on Pixel smartphones without an Internet connection by the end of the year. The almost 100-gigabyte voice model was reduced to just under 0.5 gigabytes for this.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

Why an artificial intelligence must go to school

Back to School (of AI)

Apple's claims about large reasoning models face fresh scrutiny from a new study

Alibaba's new GPT-4o competitor Qwen VLo is no longer open source

Researchers hide prompts in scientific papers to sway AI-powered peer review

"Cat attack" on reasoning model shows how important context engineering is

AI coding can make developers slower even if they feel faster

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

"Cat attack" on reasoning model shows how important context engineering is

Why an artificial intelligence must go to school

Back to School (of AI)

Share

Bank details