A new open-source system enables the training of 70-billion-parameter language models on gaming GPUs.
An open-source system released by Answer.AI makes it possible for the first time to efficiently train language models with 70 billion parameters on conventional desktop computers with standard gaming graphics cards. The system combines FSDP and QLoRA technologies and is the result of a collaboration between Answer.AI, Hugging Face, and other researchers.
The challenge of training large language models lies in the limited memory capacity of standard graphics cards, which have a maximum of 24 GB of RAM, compared to expensive data center cards with up to 80 GB of RAM.
QLoRA, developed by Tim Dettmers, one of the researchers involved, enables the training of larger models on a single GPU through the use of quantization and LoRA. Quantization reduces the number of bits used to store the parameters of a neural network, while LoRA trains specific adapters without changing the entire model.
FSDP (Fully Sharded Data Parallel) from Meta's PyTorch team, on the other hand, makes it possible to distribute a model across multiple GPUs to take advantage of all the graphics cards simultaneously. This technique splits the parameters of a large model and makes it possible to provide all the necessary fragments on each GPU during training.
Team successfully trains 70-billion-parameter model on two GPUs
By combining QLoRA and FSDP, the team was able to train a model with 70 billion parameters on two 24 GB GPUs. In addition, techniques such as gradient checkpointing and CPU offloading were used to reduce GPU memory requirements. The team further reduced memory consumption with HQQ, a method that enables faster and more accurate quantization than previous approaches. HQQ has been successfully integrated into the FSDP system.
The goal is to make AI more accessible and enable more people to not only use but also create valuable models. Potentially, the method and new maps could be used to train even larger AI models in the future.