OpenAI doesn't see AI as a replacement for doctors - but as a way to replace not going to the doctor at all.
"I really don’t think you end up displacing doctors," said Nick Turley, who heads up ChatGPT at OpenAI, on the company's official podcast. “You end up displacing not going to the doctor.”
Turley argues that AI systems like ChatGPT aren't meant to take jobs away from medical professionals, but to empower patients - especially in places where access to care is limited. “You end up democratizing the ability to get a second opinion,” he said. “Very few people have that resource or know to take advantage of a resource like that.”
ChatGPT in the hands of medical professionals
This kind of support isn't just for patients. Doctors themselves are already using ChatGPT to double-check their thinking or gain new perspectives. But for AI to truly earn trust in medicine, Turley says it's not enough for the models to simply be good: “There’s work to make the model really, really good – and there’s also work to prove that the model is really good.”
Both users and, especially, professionals need a clear understanding of where these models are reliable and where they're not. Until there's solid proof and systematic testing, trust will remain a major challenge for AI-powered medicine. As models get better, Turley warns, it actually becomes harder to spot and communicate their limits. “Once it gets to human and then superhuman level performances, it’s hard to frame exactly where it will fall short," he said.
Still, Turley sees enormous potential: "That opportunity is one of the things that gets me up in the morning." Alongside education, he believes healthcare is where AI could have the biggest impact on society.
Benchmarks are one thing - real-world medicine is another
OpenAI points out in a recent benchmark that its latest models, GPT-4.1 and o3, outperform doctors' responses in medical dialogue scenarios. At the same time, new systems like Microsoft's MAI-DxO are showing that orchestrated AI models can even surpass experienced physicians in complex diagnoses - both in terms of accuracy and cost efficiency.
But the tests themselves are highly controlled, and direct comparisons to real clinical settings are limited. Just because AI systems perform well in benchmarks doesn't mean they're proven in real-world interactions. For example, a study from the University of Oxford found that people sometimes made worse medical decisions with AI assistance than a control group using a search engine, often because the conversation with the chatbot broke down. Yet there are also regular reports from users who say ChatGPT helped diagnose a rare disease after years of searching for answers.