ChatGPT aces medical exams, forcing a rethink on how we train tomorrow's doctors

A Stanford study shows that ChatGPT outperforms medical students on complex case-based questions, prompting a rethink of medical education.

Researchers at Stanford have found that ChatGPT can outperform first- and second-year medical students in answering complex clinical care questions.

The study, published in JAMA Internal Medicine, highlights the growing influence of AI on medical education and practice and suggests that adjustments in teaching methods may be needed for future physicians.

"We don't want doctors who were so reliant on AI at school that they failed to learn how to reason through cases on their own," says co-author Alicia DiGiammarino, education manager at the School of Medicine. "But I'm more scared of a world where doctors aren't trained to effectively use AI and find it prevalent in modern practice."

AI beats medical students

Recent studies have demonstrated ChatGPT's ability to handle multiple-choice questions on the United States Medical License Examination (USMLE). But the Stanford authors wanted to examine the AI system's ability to handle more difficult, open-ended questions used to assess clinical reasoning skills.

The study found that, on average, the AI model scored more than four points higher than medical students on the case report portion of the exam. This result suggests the potential for AI tools like ChatGPT to disrupt traditional teaching and testing of medical reasoning through written text. The researchers also noted a significant jump from GPT-3.5, which was "borderline passing" on the questions.

ChatGPT and other programs like it are changing how we teach and ultimately practice medicine.

Alicia DiGiammarino

Despite its impressive performance, ChatGPT is not without its shortcomings. The biggest danger is invented facts or so-called hallucinations or confabulations. This has been significantly reduced in OpenAI's latest model, GPT-4, which is available to paying customers and via API, but it is still very much present.

You can imagine how even very sporadic errors can have dramatic consequences when it comes to medical topics. However, embedded in an overall curriculum with multiple sources of truth, this seems like a much smaller problem.

Stanford's School of Medicine cuts students' line to ChatGPT in exams

Concerns about exam integrity and ChatGPT's influence on curriculum design are already being felt at Stanford's School of Medicine. Administrators have switched from open-book to closed-book exams to ensure that students develop clinical reasoning skills without relying on AI. But they have also created an AI working group to explore the integration of AI tools into medical education.

Recommendation

AI research

Wait a minute! Researchers say AI's "chains of thought" are not signs of human-like reasoning

Beyond education, there are other areas where AI can have a significant impact on healthcare. For example, medical AI startup Insilico Medicine recently administered the first dose of a generative AI drug to patients in a Phase II clinical trial.

Google is field-testing Med-PaLM 2, a version of its large language model PaLM 2 fine-tuned to answer medical questions. Another study suggests that GPT-4 can help doctors answer patients' questions with more detail and empathy. Yes, you read that right: more empathy.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

ChatGPT aces medical exams, forcing a rethink on how we train tomorrow's doctors

AI beats medical students

Stanford's School of Medicine cuts students' line to ChatGPT in exams

Wait a minute! Researchers say AI's "chains of thought" are not signs of human-like reasoning

Researchers used 1,600 YouTube fail videos to show AI models struggle with surprises

AI coding can make developers slower even if they feel faster

Microsoft introduces Phi-4-mini-flash-reasoning with up to 10x higher token throughput

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Grok 4 is not officially instructed to follow Musk’s views but often does on sensitive subjects

ChatGPT aces medical exams, forcing a rethink on how we train tomorrow's doctors

AI beats medical students

Stanford's School of Medicine cuts students' line to ChatGPT in exams

Share

Bank details