A study comparing the performance of OpenAI's ChatGPT (GPT-3.5), Google Bard, and Microsoft Bing (Precision mode) in answering 77 physiology case vignettes showed that ChatGPT significantly outperformed the others (ChatGPT 3.19±0.3, Bard 2.91±0.5, Bing Chat 2.15±0.6, on a scale of 0 to 4). Two physiologists independently scored the responses of the LLMs for accuracy.
While the results highlight the potential for incorporating AI systems into medical education, the study acknowledges the need for further research to determine the effectiveness of these models in different medical fields. It's also possible that specific AI models fine-tuned for medical tasks will win the race, such as Google's recently unveiled Med-PaLM M, which also incorporates vision.