A new optimization technique called DisTrO reduces communication between GPUs during AI training by up to 10,000 times. This breakthrough could make it possible to train large language models over standard Internet connections.
Researchers have created DisTrO, a new family of optimizers that dramatically reduces data exchange between GPUs when training large AI models, including language models (LLMs) and diffusion models.
Traditional distributed training requires synchronizing full gradients between all participating accelerators (GPUs, TPUs) after each training step. This process demands extremely high bandwidth and specialized high-speed connections.
DisTrO slashes these communication requirements by four to five orders of magnitude. During the pre-training of a 1.2 billion parameter language model, the required bandwidth per training step dropped from 74.4 GB to just 86.8 MB - a 857-fold reduction.
The team reports that reductions of up to 10,000 times are possible during fine-tuning. DisTrO works independently of network topology and neural network architecture.
DisTrO aims to make AI training more accessible
The researchers believe DisTrO could democratize the training of large AI models. The drastically reduced bandwidth requirements could enable model training via normal internet connections, eliminating the need for specialized high-speed links.
This advancement could allow researchers and organizations with limited resources to participate in developing state-of-the-art AI models. Until now, this capability has been limited to governments and large tech companies in wealthy countries with the necessary funding and infrastructure.
The team suggests DisTrO could enable a fully decentralized network. The method is highly resilient to node failures or degradation and can easily adapt to new nodes.
The researchers also see great potential for applications like federated learning, where models are trained collaboratively while keeping training data private and decentralized. DisTrO could make federated learning practical for efficiently training LLMs over the internet.