New DisTrO training method could democratize AI training of large language models

Midjourney prompted by THE DECODER

A new optimization technique called DisTrO reduces communication between GPUs during AI training by up to 10,000 times. This breakthrough could make it possible to train large language models over standard Internet connections.

Researchers have created DisTrO, a new family of optimizers that dramatically reduces data exchange between GPUs when training large AI models, including language models (LLMs) and diffusion models.

Traditional distributed training requires synchronizing full gradients between all participating accelerators (GPUs, TPUs) after each training step. This process demands extremely high bandwidth and specialized high-speed connections.

DisTrO slashes these communication requirements by four to five orders of magnitude. During the pre-training of a 1.2 billion parameter language model, the required bandwidth per training step dropped from 74.4 GB to just 86.8 MB - a 857-fold reduction.

The team reports that reductions of up to 10,000 times are possible during fine-tuning. DisTrO works independently of network topology and neural network architecture.

DisTrO aims to make AI training more accessible

The researchers believe DisTrO could democratize the training of large AI models. The drastically reduced bandwidth requirements could enable model training via normal internet connections, eliminating the need for specialized high-speed links.

This advancement could allow researchers and organizations with limited resources to participate in developing state-of-the-art AI models. Until now, this capability has been limited to governments and large tech companies in wealthy countries with the necessary funding and infrastructure.

The team suggests DisTrO could enable a fully decentralized network. The method is highly resilient to node failures or degradation and can easily adapt to new nodes.

The researchers also see great potential for applications like federated learning, where models are trained collaboratively while keeping training data private and decentralized. DisTrO could make federated learning practical for efficiently training LLMs over the internet.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

New DisTrO training method could democratize AI training of large language models

DisTrO aims to make AI training more accessible

Meta Neuroscientist King: "Some of the concepts like reasoning may need to be re-evaluated"

Prime Intellect launches an open platform for reinforcement learning environments

AI training shifts from clickworkers to experts in physics, biology and engineering

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

OpenAI says GPT-5 shows 30 percent less political bias than previous models

OpenAI suddenly remembers that copyright law exists after a few days of wild Sora videos

OpenAI unveils Sora 2 video model with realistic physics, high-quality audio, and a new social app

New DisTrO training method could democratize AI training of large language models

DisTrO aims to make AI training more accessible

Share

Bank details