New method enables industry-scale LLM training on gaming GPUs

Midjourney prompted by THE DECODER

A new open-source system enables the training of 70-billion-parameter language models on gaming GPUs.

An open-source system released by Answer.AI makes it possible for the first time to efficiently train language models with 70 billion parameters on conventional desktop computers with standard gaming graphics cards. The system combines FSDP and QLoRA technologies and is the result of a collaboration between Answer.AI, Hugging Face, and other researchers.

The challenge of training large language models lies in the limited memory capacity of standard graphics cards, which have a maximum of 24 GB of RAM, compared to expensive data center cards with up to 80 GB of RAM.

QLoRA, developed by Tim Dettmers, one of the researchers involved, enables the training of larger models on a single GPU through the use of quantization and LoRA. Quantization reduces the number of bits used to store the parameters of a neural network, while LoRA trains specific adapters without changing the entire model.

FSDP (Fully Sharded Data Parallel) from Meta's PyTorch team, on the other hand, makes it possible to distribute a model across multiple GPUs to take advantage of all the graphics cards simultaneously. This technique splits the parameters of a large model and makes it possible to provide all the necessary fragments on each GPU during training.

Team successfully trains 70-billion-parameter model on two GPUs

By combining QLoRA and FSDP, the team was able to train a model with 70 billion parameters on two 24 GB GPUs. In addition, techniques such as gradient checkpointing and CPU offloading were used to reduce GPU memory requirements. The team further reduced memory consumption with HQQ, a method that enables faster and more accurate quantization than previous approaches. HQQ has been successfully integrated into the FSDP system.

The goal is to make AI more accessible and enable more people to not only use but also create valuable models. Potentially, the method and new maps could be used to train even larger AI models in the future.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

New method enables industry-scale LLM training on gaming GPUs

Team successfully trains 70-billion-parameter model on two GPUs

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Researchers train AI to generate long-form text using only reinforcement learning

Stanford researchers find AI agents improve when guided by past successes

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

New method enables industry-scale LLM training on gaming GPUs

Team successfully trains 70-billion-parameter model on two GPUs

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Researchers train AI to generate long-form text using only reinforcement learning

Stanford researchers find AI agents improve when guided by past successes