Content
summary Summary

The Allen Institute for AI (Ai2) has launched OLMo 3, a new line of fully open AI models. This release includes the first open 32B "thinking" model, designed to make its reasoning process visible while running 2.5 times more efficiently than similar models.

Ad

The OLMo 3 family comes in three versions: OLMo 3-Base (7B and 32B), OLMo 3-Think (7B and 32B), and OLMo 3-Instruct (7B). Each model supports a 65,000-token context window, 16 times larger than the previous OLMo 2.

Ai2 says this is the first time researchers and developers get access to everything from training data to deployment. Every training step, checkpoint, and dataset is open for inspection, and users can trace individual reasoning steps back to the exact data that produced them.

Efficiency gains without sacrificing performance

According to Ai2, the OLMo 3-Base 7B model is trained with 2.5 times the compute efficiency of Meta’s Llama-3.1-8B, measured by GPU hours per token. Despite the efficiency boost, OLMo 3 models are said to achieve performance that rivals much larger systems. OLMo 3 outperforms open competitors like Apertus-70B and SmolLM 3 on reasoning, comprehension, and long-context benchmarks.

Ad
Ad

CEO Ali Farhadi explained that "high performance doesn't have to come at high cost" and that the system demonstrates how "responsible, sustainable AI can scale without compromise." Here’s how the Reasoning model stacks up on benchmarks:

Skill Benchmark Olmo 3-Think (32B) Qwen 3 32B Qwen 3 VL 32B Thinking Gemma 3 27B Instruct DeepSeek R1 Distill 32B
Math MATH 96.1 ▲ 95.4 96.7 87.4 92.6
AIME 2024 76.8 80.8 86.3 28.9 70.3
AIME 2025 72.5 70.9 78.8 22.9 56.3
OMEGA 50.8 ▲ 47.7 50.8 24.0 38.9
Reasoning BigBenchHard 89.8 ▲ 90.6 91.1 82.4 89.7
ZebraLogic 76.0 88.3 96.1 24.8 69.4
AGI Eval English 88.2 90.0 92.2 76.9 88.1
Coding HumanEvalPlus 91.4 ▲ 91.2 90.6 79.2 92.3
MBPP+ 68.0 70.6 66.2 65.7 70.1
LiveCodeBench v3 83.5 90.2 84.8 39.0 79.5
IF IFEval 89.0 ★ 86.5 85.5 85.4 78.7
IFBench 47.6 37.3 55.1 31.3 23.8
Knowledge & QA MMLU 85.4 88.8 90.1 74.6 88.0
PopQA 31.9 ▲ 30.7 32.2 30.2 26.7
GPQA 58.1 67.3 67.4 45.0 61.8
Chat AlpacaEval 2 LC 74.2 75.6 80.9 65.5 26.2
Safety Safety 68.8 69.0 82.7 68.6 63.6

(★ indicates Olmo won the category; ▲ indicates Olmo is within 2.0 points of the top score. Additional comparisons are available in the full report.)

Bringing transparency to reasoning models

OLMo 3-Think is the first fully open model to generate explicit, step-by-step reasoning chains. Until now, this kind of visible logic was limited to closed systems like OpenAI’s o1 series. With OLMo 3, users can see exactly how the model reaches its conclusions and follow the entire process from data to output. The new models are available for testing in the Ai2 Playground.

Most so-called open-source models only release their weights, keeping their datasets and training process private. These are really "open weights" models, offering only partial transparency. The best open-weight reasoning models, like Kimi K2 Thinking from Moonshot AI, have mostly come from China. OLMo 3 goes further by opening up the full pipeline.

Open tools for custom training and evaluation

OLMo 3 is trained on the Dolma 3 dataset, which contains six trillion tokens from web content, scientific papers, and code. Ai2 also released the Dolci Suite for fine-tuning reasoning skills and OLMES for reproducible model evaluation.

Recommendation

All models are released under the Apache 2.0 license and are available on Hugging Face and in the Ai2 Playground. Teams can fine-tune these models for new domains, experiment with different training goals, or build on the published checkpoints.

Earlier this year, Ai2’s OLMo 2 32B matched the performance of commercial models like GPT-4o mini while using only about a third of the compute resources. OLMo 3 continues this work, focusing on further improvements in openness, efficiency, and transparency.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • The Allen Institute for AI has launched OLMo 3, a new generation of fully transparent AI models, including the first openly available 32B model that reveals its reasoning processes.
  • OLMo 3 offers complete transparency on training steps, checkpoints, and datasets.
  • Built on the Dolma 3 dataset, the models are openly licensed and can be accessed immediately on Hugging Face and the Ai2 Playground.
Sources
Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.