Content
summary Summary

A new open-source system enables the training of 70-billion-parameter language models on gaming GPUs.

An open-source system released by Answer.AI makes it possible for the first time to efficiently train language models with 70 billion parameters on conventional desktop computers with standard gaming graphics cards. The system combines FSDP and QLoRA technologies and is the result of a collaboration between Answer.AI, Hugging Face, and other researchers.

The challenge of training large language models lies in the limited memory capacity of standard graphics cards, which have a maximum of 24 GB of RAM, compared to expensive data center cards with up to 80 GB of RAM.

QLoRA, developed by Tim Dettmers, one of the researchers involved, enables the training of larger models on a single GPU through the use of quantization and LoRA. Quantization reduces the number of bits used to store the parameters of a neural network, while LoRA trains specific adapters without changing the entire model.

Ad
Ad

FSDP (Fully Sharded Data Parallel) from Meta's PyTorch team, on the other hand, makes it possible to distribute a model across multiple GPUs to take advantage of all the graphics cards simultaneously. This technique splits the parameters of a large model and makes it possible to provide all the necessary fragments on each GPU during training.

Team successfully trains 70-billion-parameter model on two GPUs

By combining QLoRA and FSDP, the team was able to train a model with 70 billion parameters on two 24 GB GPUs. In addition, techniques such as gradient checkpointing and CPU offloading were used to reduce GPU memory requirements. The team further reduced memory consumption with HQQ, a method that enables faster and more accurate quantization than previous approaches. HQQ has been successfully integrated into the FSDP system.

The goal is to make AI more accessible and enable more people to not only use but also create valuable models. Potentially, the method and new maps could be used to train even larger AI models in the future.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Answer.AI has released an open-source system that, by combining FSDP and QLoRA technologies, makes it possible for the first time to train language models with 70 billion parameters on conventional desktop computers with standard gaming graphics cards.
  • QLoRA enables the training of large models on a single GPU through quantization and LoRA, while FSDP from Meta's PyTorch team distributes a model across multiple GPUs.
  • The team successfully trained a model with 70 billion parameters on two 24 GB GPUs, using additional techniques such as gradient checkpointing and CPU offloading to reduce GPU memory requirements.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.