Google's AMIE is a medical chatbot for expert-level differential diagnosis

Jan 14, 2024 Maximilian Schreiner

Google's AMIE is a chatbot focusing on differential diagnosis that relies on self-play for better diagnostic conversations.

Google Research and Google DeepMind have developed a medical chatbot called Articulate Medical Intelligence Explorer (AMIE). Unlike other healthcare AI systems, such as Med-PaLM 2, which typically focus on creating medical summaries or answering medical questions, AMIE is designed to serve as a diagnostic tool and create differential diagnoses.

AMIE is based on Google's PaLM and has been trained on datasets containing medical conclusions, medical summaries, and real clinical conversations. AMIE also uses a simulated learning environment with automatic feedback mechanisms.

https://the-decoder.de/wp-content/uploads/2024/01/Google-AMIE.mp4?_=1

Video: Google

The self-play-based diagnostic dialogue environment consists of two loops: an "inner" loop, in which AMIE conducts simulated conversations with an AI patient simulator which are then evaluated by a reviewer, and an "outer" loop, in which the simulated and positive evaluated dialogues serve as material for fine-tuning subsequent iterations of AMIE. This iterative process in conjunction with Chain-of-Thought has significantly improved the dialogue quality of AMIE, according to Google. A paper of an earlier version reports better results in differential diagnosis than OpenAI's GPT-4.

AMIE outperforms human doctors in text consultation

The team tested AMIE in a randomized, double-blind crossover study in which human actors randomly conducted simulated patient conversations with either real doctors or AMIE via a text interface. The team found that the AI system was at least as effective as the human doctors in the simulated diagnostic consultations. It had higher diagnostic accuracy and performance in many clinically important aspects of consultation quality, Google says. The results were evaluated by both medical specialists and patient records.

However, real conversations are often conducted face-to-face, so the study may underestimate the actual value of human conversation, according to Google. The study also simulated rarer illnesses.

Extensive further research is needed to turn AMIE from a research prototype into a robust clinical tool, says the team, including research on fairness, privacy, robustness, and the system's performance in real-life conditions.

Sources:

Google, Arxiv