OpenAI researchers explain why math is the road to AGI
AI models have jumped from grade-school arithmetic to olympiad-level and research mathematics in only two years. In the OpenAI Podcast, OpenAI researchers Sebastian Bubeck and Ernest Ryu explain why math has become the key test on the road to artificial general intelligence.
Reasoning models didn't exist two years ago. Four years ago, Bubeck was impressed when Google's Minerva model could draw a line through points on a coordinate system. Today, he told Andrew Mayne, these systems are helping Fields Medal winners with their daily work. At a conference 18 months ago, 80 percent of the mathematicians in the room thought it was impossible for scaled-up LLMs to crack open research problems, Bubeck says.
Ernest Ryu, a former UCLA math professor, says he solved a 42-year-old open problem about Nesterov's method in optimization theory using ChatGPT - in just twelve hours spread across three evenings. He had already spent more than 40 hours on it without AI and gotten nowhere. Ryu acted as a verifier, catching errors and steering the conversation in promising directions.
Why math has become the benchmark for AGI
For Bubeck, math isn't the yardstick for AGI progress by accident. It demands exactly the kind of capability a generally intelligent system needs. Mathematical proofs require long, consistent reasoning over hours, days, or even years, and a single mistake anywhere in the chain destroys the entire argument, no matter how correct the rest is. Anything that can handle that has to be able to spot and fix its own errors.
That's what the researchers want to carry over from math training into other fields, from biology to materials science. Bubeck draws a parallel with how people are educated: students learn math not because they'll go on to write proofs, but because the subject forces them to think logically.
Math also has practical advantages as a benchmark. Problems are clearly stated, answers can be checked, and nobody argues about whether a result is correct. Bubeck introduces the idea of "AGI time": two years ago, models could simulate a student's thinking for minutes. Today, they're up to days or even a week. The next target is weeks and months.
OpenAI's training methods aren't specific to math, Bubeck says, but general, which means progress in other sciences should follow. The researchers are building an "automated researcher" that can work on problems on its own over long stretches of time.
The Erdős problems and the fight over what they mean
Bubeck and Ryu also dig into the Erdős problems, a collection of open questions left behind by the late Hungarian mathematician. Bubeck says internal models initially found solutions to ten problems marked as open, mostly through deep literature searches. His misleading tweet about it sparked a public spat with Google CEO Demis Hassabis, since many people read it as a claim that OpenAI had produced new proofs. By now, Bubeck says, ChatGPT and internal models have actually produced more than ten genuinely new solutions worthy of publication in academic journals.
What seemed like an unrealistic claim is now reality, and the pace is picking up. Bubeck sees this as evidence that the models are making the leap from recombining existing knowledge to producing new mathematics. Even if the philosophical question of whether scientific progress is anything more than clever recombination plus a bit of reasoning remains open.
The risks: mental atrophy and fake proofs
Both researchers warn against using these tools superficially. Expertise matters more than ever, they argue, because only trained mathematicians can put the models to productive use. Non-mathematicians who post long AI-generated proofs on social media are usually wrong. Ryu sees the same pattern in programming, where a whole generation is losing the ability to use debuggers.
Bubeck says claims that scientists are no longer needed are therefore dangerous. Academic institutions need to actively reclaim their role. At the same time, AI can speed up proof verification - a process that currently takes years - and flag problems in published papers.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe nowRead on for the full picture.
Subscribe for hype-free coverage.
- Access to all THE DECODER articles.
- Read without distractions – no Google ads.
- Access to comments and community discussions.
- Weekly AI newsletter.
- 6 times a year: “AI Radar” – deep dives on key AI topics.
- Up to 25 % off on KI Pro online events.
- Access to our full ten-year archive.
- Get the latest AI news from The Decoder.