Google DeepMind has claimed its first gold medal at the International Mathematical Olympiad (IMO) with an "advanced version" of its Gemini model running in Deep Think mode.
The system solved five out of six problems in algebra, combinatorics, geometry, and number theory, earning 35 out of 42 possible points—enough for a gold medal, which only about eight percent of human participants achieve, according to the IMO. DeepMind says the solutions (PDF download) were reviewed by official IMO judges and described as "clear, precise and most of them easy to follow."
What makes this win stand out is the method: last year, DeepMind used formal languages like Lean and spent days computing with AlphaProof and AlphaGeometry, but this time, Gemini Deep Think worked entirely in natural language.
The model produced full proofs directly from the official IMO problems, all within the four-and-a-half-hour time limit per session and without external tools or symbolic aids. DeepMind notes that Gemini faced the same problems and time constraints as human competitors.
The IMO model runs on the new "Deep Think" mode of Gemini 2.5 Pro, which Google introduced in May for complex reasoning tasks. This mode lets the model follow multiple hypotheses in parallel before generating an answer and is currently being tested with select users. For comparison, the standard Gemini 2.5 Pro managed to solve only 31.5 percent of the Olympiad's problems.
Gemini Deep Think was trained with specialized reinforcement learning methods to encourage multi-step reasoning, problem-solving, and theorem-proving. The IMO version also had more "thinking time," access to a curated set of high-quality solutions from previous IMO tasks, and general guidance on tackling these kinds of problems. DeepMind says these methods helped the model follow and combine several solution paths in parallel before settling on a final answer.
OpenAI also claims math gold
OpenAI announced its own IMO gold medal last weekend. According to OpenAI, one of its internal language models also solved five out of six Olympiad problems under competition conditions, with proofs reviewed by three former IMO gold medalists.
OpenAI says its model worked through two four-and-a-half-hour sessions with no internet access, code, or external tools—relying entirely on natural language. Like DeepMind, OpenAI notes that its model is a generalist reasoning system, not one trained exclusively for the IMO.
Until recently, this kind of result was considered nearly impossible. Even mathematician Terence Tao doubted in June that a language model could solve IMO problems in real time. The fact that two systems crossed this milestone at the same time marks a major shift.
A new phase for reasoning AI—with open questions
Both results suggest that advanced AI models with strong reasoning and reinforcement learning can now tackle complex math problems for hours at a stretch—without relying on symbolic tools.
However, these announcements leave some questions unanswered. For example, OpenAI hasn’t shared any details about the model architecture, training data, or resources used. Similarly, DeepMind hasn’t said how scalable or transferable its Deep Think approach might be, nor has it addressed whether the approach could handle other tasks or scientific fields. It's also unclear how consistently these systems would perform on longer proofs or in other branches of mathematics.
Still, the results show that the approach works in practice, and for now, the details may matter less than the outcome. Sustained, accurate reasoning over hours has long been seen as a major hurdle for language models. With these results, the race for reasoning-capable AI is entering a new phase—and, at least in math, machines are moving much closer to human-level performance.