AI research

ChatGPT aces medical exams, forcing a rethink on how we train tomorrow's doctors

Matthias Bastian
A sad looking medical student, stylish photography

Midjourney prompted by THE DECODER

A Stanford study shows that ChatGPT outperforms medical students on complex case-based questions, prompting a rethink of medical education.

Researchers at Stanford have found that ChatGPT can outperform first- and second-year medical students in answering complex clinical care questions.

The study, published in JAMA Internal Medicine, highlights the growing influence of AI on medical education and practice and suggests that adjustments in teaching methods may be needed for future physicians.

"We don't want doctors who were so reliant on AI at school that they failed to learn how to reason through cases on their own," says co-author Alicia DiGiammarino, education manager at the School of Medicine. "But I'm more scared of a world where doctors aren't trained to effectively use AI and find it prevalent in modern practice."

AI beats medical students

Recent studies have demonstrated ChatGPT's ability to handle multiple-choice questions on the United States Medical License Examination (USMLE). But the Stanford authors wanted to examine the AI system's ability to handle more difficult, open-ended questions used to assess clinical reasoning skills.

The study found that, on average, the AI model scored more than four points higher than medical students on the case report portion of the exam. This result suggests the potential for AI tools like ChatGPT to disrupt traditional teaching and testing of medical reasoning through written text. The researchers also noted a significant jump from GPT-3.5, which was "borderline passing" on the questions.

ChatGPT and other programs like it are changing how we teach and ultimately practice medicine.

Alicia DiGiammarino

Despite its impressive performance, ChatGPT is not without its shortcomings. The biggest danger is invented facts or so-called hallucinations or confabulations. This has been significantly reduced in OpenAI's latest model, GPT-4, which is available to paying customers and via API, but it is still very much present.

You can imagine how even very sporadic errors can have dramatic consequences when it comes to medical topics. However, embedded in an overall curriculum with multiple sources of truth, this seems like a much smaller problem.

Stanford's School of Medicine cuts students' line to ChatGPT in exams

Concerns about exam integrity and ChatGPT's influence on curriculum design are already being felt at Stanford's School of Medicine. Administrators have switched from open-book to closed-book exams to ensure that students develop clinical reasoning skills without relying on AI. But they have also created an AI working group to explore the integration of AI tools into medical education.

Beyond education, there are other areas where AI can have a significant impact on healthcare. For example, medical AI startup Insilico Medicine recently administered the first dose of a generative AI drug to patients in a Phase II clinical trial.

Google is field-testing Med-PaLM 2, a version of its large language model PaLM 2 fine-tuned to answer medical questions. Another study suggests that GPT-4 can help doctors answer patients' questions with more detail and empathy. Yes, you read that right: more empathy.

Sources: