New DisTrO training method could democratize AI training of large language models

Midjourney prompted by THE DECODER

A new optimization technique called DisTrO reduces communication between GPUs during AI training by up to 10,000 times. This breakthrough could make it possible to train large language models over standard Internet connections.

Researchers have created DisTrO, a new family of optimizers that dramatically reduces data exchange between GPUs when training large AI models, including language models (LLMs) and diffusion models.

Traditional distributed training requires synchronizing full gradients between all participating accelerators (GPUs, TPUs) after each training step. This process demands extremely high bandwidth and specialized high-speed connections.

DisTrO slashes these communication requirements by four to five orders of magnitude. During the pre-training of a 1.2 billion parameter language model, the required bandwidth per training step dropped from 74.4 GB to just 86.8 MB - a 857-fold reduction.

The team reports that reductions of up to 10,000 times are possible during fine-tuning. DisTrO works independently of network topology and neural network architecture.

DisTrO aims to make AI training more accessible

The researchers believe DisTrO could democratize the training of large AI models. The drastically reduced bandwidth requirements could enable model training via normal internet connections, eliminating the need for specialized high-speed links.

This advancement could allow researchers and organizations with limited resources to participate in developing state-of-the-art AI models. Until now, this capability has been limited to governments and large tech companies in wealthy countries with the necessary funding and infrastructure.

The team suggests DisTrO could enable a fully decentralized network. The method is highly resilient to node failures or degradation and can easily adapt to new nodes.

The researchers also see great potential for applications like federated learning, where models are trained collaboratively while keeping training data private and decentralized. DisTrO could make federated learning practical for efficiently training LLMs over the internet.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

New DisTrO training method could democratize AI training of large language models

DisTrO aims to make AI training more accessible

The next leap in AI depends on agents that learn by doing, not just by reading what humans wrote

Researchers train AI to generate long-form text using only reinforcement learning

Stanford researchers find AI agents improve when guided by past successes

GTC '25: Nvidia showcases Blackwell Ultra, DGX Spark, RTX Pro, Dynamo and reasoning models

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New DisTrO training method could democratize AI training of large language models

DisTrO aims to make AI training more accessible

Share

Bank details