OpenAI says its new ChatGPT for Clinicians outperforms doctors on clinical tasks even when they have unlimited time and web access
Key Points
- OpenAI has launched "ChatGPT for Clinicians," a free AI tool designed specifically for everyday medical practice, available to verified healthcare professionals in the USA.
- The system includes features like real-time clinical searches across specialist literature, templates for recurring workflows, and automatic recognition of continuing medical education credits.
- Alongside the launch, OpenAI published the "HealthBench Professional" benchmark, where the customized GPT-5.4 version scored 59.0 points, outperforming human doctors, who scored 43.7 points despite having unlimited time and internet access.
OpenAI is rolling out ChatGPT for Clinicians, a free version of its chatbot for medical professionals. A new benchmark claims GPT-5.4 beats human doctors on clinical tasks, even when those doctors have unlimited time and internet access.
OpenAI has launched a version of ChatGPT built specifically for clinical work. It's free for verified physicians, nurses with advanced clinical qualifications, physician assistants, and pharmacists in the US. Alongside it, the company is releasing HealthBench Professional, a new benchmark for clinical AI tasks. According to OpenAI, GPT-5.4 outperforms human doctors on it.
A benchmark built to be hard
HealthBench Professional measures AI performance across three clinical areas: consultations, writing and documentation, and medical research. It builds on the earlier HealthBench and uses doctor-written conversations, multi-level physician scoring, and targeted data filtering.
OpenAI says the benchmark was designed to be tough. About a third of the examples come from targeted "red teaming," where doctors actively tried to find weaknesses in the models. The hardest conversations were overrepresented by a factor of 3.5.
GPT-5.4 running in the ChatGPT for Clinicians workspace scored 59.0 overall on HealthBench Professional. Doctor-written responses came in at 43.7, even with unlimited time and internet access. Every other model tested scored well below the Clinicians version: the base GPT-5.4 hit 48.1, Anthropic's Claude Opus 4.7 reached 47.0, Google's Gemini 3.1 Pro scored 43.8, and xAI's Grok 4.2 landed at 36.1.

GPT-5.4 in the Clinicians workspace scores about 11 points higher than the base GPT-5.4 (59.0 vs. 48.1). How much of that comes from the clinical setup itself versus the way the benchmark is built is unclear, and benchmark scores don't necessarily translate to real clinical practice.
99.6 percent of answers rated reliable
There's an obvious methodological wrinkle here: OpenAI built the benchmark and tested its own models on it. The company points to third-party evaluations like Stanford's MedHELM and MedMarks, where OpenAI models also rank at the top, and the benchmark and dataset are openly available.
OpenAI says ChatGPT for Clinicians was developed with hundreds of medical advisors. Before launch, doctors tested 6,924 conversations in their everyday clinical work, and 99.6 percent of the responses were rated safe and accurate, according to Karan Singhal from OpenAI's Health unit.
In a subset of 355 examples where three independent doctors each specified correct sources, ChatGPT for Clinicians cited those sources more often than human doctors did. In total, more than 700,000 model responses have been reviewed by physicians so far. OpenAI stresses that the tool is meant to support clinicians, not replace their judgment.
Clinical search, reusable workflows, and CME credits
According to OpenAI, ChatGPT for Clinicians comes with free access to the company's current frontier models, a clinical search function that pulls from millions of peer-reviewed sources with real-time citations, and a deep research feature for medical literature.
There are also "skills," which let clinicians turn recurring workflows, like referral letters, prior authorizations, or patient instructions, into reusable templates. One unusual feature: clinical research done in ChatGPT can count toward continuing medical education (CME) credits in the US.
On privacy, OpenAI says conversations won't be used for model training. Optional HIPAA compliance through a Business Associate Agreement is available for users handling protected health information.
US-only for now, global rollout planned
ChatGPT for Clinicians is launching only for verified clinicians in the US. OpenAI plans to expand internationally and is working with the Better Evidence Network on pilot projects outside the country. The company is also publishing a Health Blueprint with recommendations for responsibly integrating AI into the US healthcare system.
The push comes as AI adoption in medicine accelerates. A 2026 survey from the American Medical Association found that 72 percent of US doctors now use AI in clinical practice, up from 48 percent the year before. OpenAI says millions of clinicians worldwide already use ChatGPT weekly, with usage more than doubling over the past year.
Earlier this year, OpenAI launched ChatGPT for Healthcare for organizations, giving health systems institutional-level compliance and administrative controls. Anthropic, Microsoft, and Google are all pushing into the medical market with their own AI models too, with Google focusing especially on drug development through Google Deepmind.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe now