GPT-4 shines in Microsoft radiology study, outperforming human experts on some tasks

DALL-E 3 prompted by THE DECODER

Microsoft recently published a study that explores the capabilities and limitations of GPT-4 in radiology.

Working with a radiologist and Nuance, a Microsoft company whose PowerScribe solution is used by more than 80 percent of radiologists in the U.S., the research team created a comprehensive evaluation and defect analysis framework.

Within this framework, the team evaluated GPT-4's ability to process radiology reports, including understanding natural language and generating radiology tasks such as classifying diseases and summarizing findings. For the tasks, the team was careful to emphasize complex and challenging real-world radiology scenarios.

GPT-4 outperforms even human radiologists

The study found that GPT-4 demonstrated new state-of-the-art performance on some tasks, outperforming existing systems by up to ten percent. Although GPT-4 occasionally fails to surface domain knowledge, it has "substantial capability in processing and analysis of radiology text, achieving near-ceiling performance in many tasks," the paper states.

In einer Studie untersuchte Microsoft die Möglichkeiten und Grenzen von GPT-4 in der Radiologie. — GPT-4 can outperform existing systems on radiological text tasks by up to ten percent. | Picture: Microsoft

In some cases, the radiology report summaries generated by GPT-4 were even more accurate and provided more detailed findings than the reports generated by experienced radiologists.

The radiology reports generated by GPT-4 were preferred over those written by humans. | Image: Microsoft

Another promising aspect of GPT-4 is its ability to automatically structure radiology reports, which are often complex and unstructured. Studies have shown that structured reports can improve standardization and consistency in the description of disease.

This facilitates interpretation by other healthcare providers and makes them more searchable for research and quality improvement initiatives.

GPT-4 could help improve real-world data (RWD) and its use for real-world evidence (RWE) to complement clinical trials and accelerate the translation of research into clinical practice.

Are large language models better specialists?

The results are encouraging, but need to be confirmed by further research and clinical trials, the researchers wrote. "When used with human oversight, GPT-4 also has the potential to transform radiology by assisting professionals in their day-to-day tasks."

Recommendation

AI research

OpenAI's stunning video generation debut Sora feels like a GPT-4 moment

Earlier in August, Microsoft researchers published study results showing that generalist AI models pre-trained with large amounts of data, such as GPT-4, can outperform specialized medical models.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

GPT-4 shines in Microsoft radiology study, outperforming human experts on some tasks

GPT-4 outperforms even human radiologists

Are large language models better specialists?

OpenAI's stunning video generation debut Sora feels like a GPT-4 moment

Stable Video 4D creates moving 3D models from video

Google DeepMind's latest AI models might bring us one step closer to LLMs that can reason

Rule-Based Rewards: OpenAI provides insight into the GPT-4 safety stack

Rule-Based Rewards: OpenAI provides insight into the GPT-4 safety stack

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

AI models might need to scale down to scale up again

GPT-4 shines in Microsoft radiology study, outperforming human experts on some tasks

GPT-4 outperforms even human radiologists

Are large language models better specialists?

Share

Bank details