AMD improves open source model with less training data

Nov 8, 2024

AMD

Key Points

AMD has released its first open source language model with one billion parameters. It is based on the OLMo architecture, but was trained with less than half the number of training tokens and still achieves comparable performance.
The three stages of training included a baseline model, two stages of supervised fine-tuning for specific skills, and adaptation to human preferences. In benchmarks, AMD's OLMo outperformed other open source chat models by an average of 2.6 percent.
The release of OLMo is part of AMD's broader AI strategy, which includes investments in AI companies, acquisitions, and the development of specialized AI hardware such as the Instinct MI355X accelerator planned for 2025, with which AMD intends to compete directly with Nvidia.

AMD has released its first open-source language model with one billion parameters. The model builds on a previous version but uses significantly less training data.

While based on the same open-source architecture, AMD's OLMo differs from the original in key aspects. According to AMD, the model was trained with less than half of the training tokens used in the original OLMo. Still, it achieves comparable performance.

Flussdiagramm: Dreistufiger Trainingsprozess für AMD OLMo 1B von Pre-training über SFT bis DPO Alignment mit spezifischen Datensätzen. — The three-stage development of AMD's OLMo 1B model shows its evolution from the base language model through chat optimization to the final alignment with human preferences. Each phase uses specific datasets to enhance AI capabilities. | Image: AMD

AMD's version of OLMo went through a three-stage training process. In the first phase, the base model was trained with 1.3 trillion tokens across 16 server nodes, each equipped with four AMD Instinct MI250 GPUs.

The second phase involved two-step supervised fine-tuning with various datasets to improve capabilities in areas like science, programming, and mathematics. The third phase consisted of human preference alignment based on the UltraFeedback dataset.

Strong performance against competitors

According to AMD, the final OLMo model outperforms other open-source chat models in several benchmarks by an average of 2.6 percent.

Säulendiagramm: Vergleich von 6 LLM-Modellen über 12 Benchmarks, AMD OLMo 1B zeigt Leistungssteigerungen bei mehreren Tests. — The performance comparisons of different LLM models show remarkable improvements by AMD OLMo 1B, with increases of up to 6.36 percent in certain benchmarks. | Image: AMD

The two-phase training showed notable improvements: accuracy in MMLU tests increased by 5.09 percent, while GSM8k tests saw a 15.32 percent improvement.

AMD says a key feature of OLMo is its compatibility with various hardware platforms. Beyond data center use, the model can run on laptops with AMD's Ryzen AI processors and integrated Neural Processing Units (NPUs).

The model, training data and code are available on Hugging Face.

AMD's major AI investment push

The release of OLMo is part of AMD's broader AI strategy. The company reported in July that it invested over $125 million in a dozen AI companies over the past twelve months. Recently, AMD acquired Finnish AI company Silo AI for $665 million and open-source AI startup Nod.ai.

At the same time, AMD is advancing specialized AI hardware development. With the AI accelerator Instinct MI355X announced for 2025, the company aims to compete directly with Nvidia.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: AMD | Tom's Hardware | Hugging Face