OpenAI manager Weil: "2026 will be for science what 2025 was for software engineering"
Key Points
- Kevin Weil, head of OpenAI's science team, says GPT-5 is already making researchers more productive and predicts 2026 will be a breakthrough year for AI in science, similar to what 2025 was for software engineering.
- After OpenAI executives had to delete posts falsely claiming GPT-5 solved unsolved math problems, Weil now strikes a humbler tone, emphasizing the model works best as a sparring partner rather than an oracle.
- OpenAI is working on giving GPT-5 "epistemological humility" to present ideas as suggestions, while mathematician Terence Tao estimates only one to two percent of open problems can currently be solved by AI with minimal human help.
Kevin Weil leads OpenAI's science team and believes GPT-5 is already making researchers more productive. For 2026, he expects significant progress - and more humility.
"These models are no longer just better than 90% of grad students," Weil says in an interview with MIT Technology Review about the latest generation of large language models. "They're really at the frontier of human abilities." As one example of this performance leap, he points to the GPQA benchmark, which tests PhD-level knowledge in biology, physics, and chemistry: GPT-4 scored 39%, well below the human-expert baseline of around 70%. GPT-5.2, the update released in December, scores 92% according to OpenAI.
Weil predicts a shift: "I think 2026 will be for science what 2025 was for software engineering." At the beginning of 2025, using AI to write code made you an early adopter. Twelve months later, not using AI means falling behind, he says. "I think that in a year, if you're a scientist and you're not heavily using AI, you'll be missing an opportunity to increase the quality and pace of your thinking."
The company aims to develop an autonomous research agent by 2028. Weil himself previously held leadership positions with a more traditional product focus at OpenAI, and before that at Instagram and Twitter. As head of OpenAI for Science, he now combines this experience with his scientific training: Weil studied physics.
AI collaborator, not Einstein bot
Weil sees the model's strengths primarily in making connections. "GPT 5.2 has read substantially every paper written in the last 30 years," he says. "And it understands not just the field that a particular scientist is working in; it can bring together analogies from other, unrelated fields."
That's powerful, says Weil: "You can always find a human collaborator in an adjacent field, but it's difficult to find, you know, a thousand collaborators in all thousand adjacent fields that might matter." Additionally, the model works at night and can handle ten queries in parallel - "which is kind of awkward to do to a human."
In November, OpenAI published case studies from scientists who are already using GPT-5 in their research. Most of them approached OpenAI, not the other way around, Weil emphasizes.
After deleted posts: Weil strikes a more humble tone
The enthusiasm at OpenAI had recently overshot the mark. In October, senior figures at the company, including Weil himself, boasted on X that GPT-5 had found solutions to several unsolved math problems. Mathematicians quickly corrected them: GPT-5 had actually dug up existing solutions in old research papers. Weil and his colleagues deleted their posts.
Now Weil is more careful. It is often enough to find answers that exist but have been forgotten, he says. "We collectively stand on the shoulders of giants, and if LLMs can kind of accumulate that knowledge so that we don't spend time struggling on a problem that is already solved, that's an acceleration all of its own."
He tempers expectations that LLMs will soon make a groundbreaking new discovery: "I don't think models are there yet. Maybe they'll get there. I'm optimistic." But that's not the mission anyway: "Our mission is to accelerate science. And I don't think the bar for the acceleration of science is, like, Einstein-level reimagining of an entire field."
Mathematician Tao sees progress - with caveats
Renowned mathematician Terence Tao also confirms that AI models are making progress in mathematics. He recently reported that GPT-5.2 Pro solved a so-called Erdos problem "more or less autonomously" - in contrast to the October case.
However, Tao accompanies his assessments with extensive caveats. Many Erdos problems have never been systematically investigated and if a 50-year-old problem is now solved by an AI, that doesn't mean it "resisted all human efforts" for 50 years. Tao estimates that only about one to two percent of open problems are simple enough to be solved with today's AI tools with minimal human assistance. The harder the problem, the more human guidance is still required.
OpenAI working on "epistemological humility"
Hallucinations naturally remain a problem. But for Weil, this has two sides. A colleague, a former math professor, told him: "When I'm doing research, if I'm bouncing ideas off a colleague, I'm wrong 90% of the time and that's kind of the point. We're both spitballing ideas and trying to find something that works."
That's exactly how one should understand GPT-5, says Weil: not as an oracle, but as a sparring partner. To ensure the model sees itself that way too, OpenAI is now working on dialing down its confidence. Instead of saying "Here's the answer," it should tell scientists: "Here's something to consider."
"That's actually something that we are spending a bunch of time on," says Weil. "Trying to make sure that the model has some sort of epistemological humility." Whether this goes beyond mere rhetoric in practice remains to be seen.
Another approach: GPT-5 should check itself. When you feed one of GPT-5's answers back into the model, it often picks it apart and highlights mistakes. "You can kind of hook the model up as its own critic," Weil explains.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now