The Allen Institute for AI (Ai2) has launched OLMo 3, a new line of fully open AI models. This release includes the first open 32B "thinking" model, designed to make its reasoning process visible while running 2.5 times more efficiently than similar models.
The OLMo 3 family comes in three versions: OLMo 3-Base (7B and 32B), OLMo 3-Think (7B and 32B), and OLMo 3-Instruct (7B). Each model supports a 65,000-token context window, 16 times larger than the previous OLMo 2.
Ai2 says this is the first time researchers and developers get access to everything from training data to deployment. Every training step, checkpoint, and dataset is open for inspection, and users can trace individual reasoning steps back to the exact data that produced them.
Efficiency gains without sacrificing performance
According to Ai2, the OLMo 3-Base 7B model is trained with 2.5 times the compute efficiency of Meta’s Llama-3.1-8B, measured by GPU hours per token. Despite the efficiency boost, OLMo 3 models are said to achieve performance that rivals much larger systems. OLMo 3 outperforms open competitors like Apertus-70B and SmolLM 3 on reasoning, comprehension, and long-context benchmarks.
CEO Ali Farhadi explained that "high performance doesn't have to come at high cost" and that the system demonstrates how "responsible, sustainable AI can scale without compromise." Here’s how the Reasoning model stacks up on benchmarks:
| Skill | Benchmark | Olmo 3-Think (32B) | Qwen 3 32B | Qwen 3 VL 32B Thinking | Gemma 3 27B Instruct | DeepSeek R1 Distill 32B |
|---|---|---|---|---|---|---|
| Math | MATH | 96.1 ▲ | 95.4 | 96.7 | 87.4 | 92.6 |
| AIME 2024 | 76.8 | 80.8 | 86.3 | 28.9 | 70.3 | |
| AIME 2025 | 72.5 | 70.9 | 78.8 | 22.9 | 56.3 | |
| OMEGA | 50.8 ▲ | 47.7 | 50.8 | 24.0 | 38.9 | |
| Reasoning | BigBenchHard | 89.8 ▲ | 90.6 | 91.1 | 82.4 | 89.7 |
| ZebraLogic | 76.0 | 88.3 | 96.1 | 24.8 | 69.4 | |
| AGI Eval English | 88.2 | 90.0 | 92.2 | 76.9 | 88.1 | |
| Coding | HumanEvalPlus | 91.4 ▲ | 91.2 | 90.6 | 79.2 | 92.3 |
| MBPP+ | 68.0 | 70.6 | 66.2 | 65.7 | 70.1 | |
| LiveCodeBench v3 | 83.5 | 90.2 | 84.8 | 39.0 | 79.5 | |
| IF | IFEval | 89.0 ★ | 86.5 | 85.5 | 85.4 | 78.7 |
| IFBench | 47.6 | 37.3 | 55.1 | 31.3 | 23.8 | |
| Knowledge & QA | MMLU | 85.4 | 88.8 | 90.1 | 74.6 | 88.0 |
| PopQA | 31.9 ▲ | 30.7 | 32.2 | 30.2 | 26.7 | |
| GPQA | 58.1 | 67.3 | 67.4 | 45.0 | 61.8 | |
| Chat | AlpacaEval 2 LC | 74.2 | 75.6 | 80.9 | 65.5 | 26.2 |
| Safety | Safety | 68.8 | 69.0 | 82.7 | 68.6 | 63.6 |
(★ indicates Olmo won the category; ▲ indicates Olmo is within 2.0 points of the top score. Additional comparisons are available in the full report.)
Bringing transparency to reasoning models
OLMo 3-Think is the first fully open model to generate explicit, step-by-step reasoning chains. Until now, this kind of visible logic was limited to closed systems like OpenAI’s o1 series. With OLMo 3, users can see exactly how the model reaches its conclusions and follow the entire process from data to output. The new models are available for testing in the Ai2 Playground.
Most so-called open-source models only release their weights, keeping their datasets and training process private. These are really "open weights" models, offering only partial transparency. The best open-weight reasoning models, like Kimi K2 Thinking from Moonshot AI, have mostly come from China. OLMo 3 goes further by opening up the full pipeline.
Open tools for custom training and evaluation
OLMo 3 is trained on the Dolma 3 dataset, which contains six trillion tokens from web content, scientific papers, and code. Ai2 also released the Dolci Suite for fine-tuning reasoning skills and OLMES for reproducible model evaluation.
All models are released under the Apache 2.0 license and are available on Hugging Face and in the Ai2 Playground. Teams can fine-tune these models for new domains, experiment with different training goals, or build on the published checkpoints.
Earlier this year, Ai2’s OLMo 2 32B matched the performance of commercial models like GPT-4o mini while using only about a third of the compute resources. OLMo 3 continues this work, focusing on further improvements in openness, efficiency, and transparency.