Google is field-testing Med-PaLM 2, the medical version of its PaLM language model, with early customers in a clinical setting.
Med-PaLM 2 has been trained on questions and answers from medical licensing exams to improve its ability to answer medical questions. The model can summarize medical documents, organize health data, and generate answers to medical questions.
According to the Wall Street Journal, initial testing of Med-PaLM 2 is underway at healthcare facilities in the U.S., including the Mayo Clinic. Google believes its model could be particularly useful in countries with "limited access to doctors."
Customer data submitted during the Med-PaLM 2 trial will be encrypted, inaccessible to Google, and controlled by the customers themselves, the WSJ reports.
Med-PaLM 2 can provide expert-level medical information, but it will still make mistakes
Google announced the first clinical trials of its Med-PaLM 2 in April of this year. According to Google, Med-PaLM 2 delivers 18 percent better performance than its predecessor and far outperforms similar models for medical tasks.
Google says Med-PaLM 2 is the first language model to achieve over 85 percent accuracy on questions similar to those on the U.S. Medical Licensing Examination (USMLE). The model achieved a "satisfactory score" of 72.3 percent on the MedMCQA dataset, which includes questions from India's AIIMS and NEET medical entrance exams.
Google researcher Greg Corrado, who helped develop Med-PaLM 2, describes the model as a technology he would not yet use for his family's health care. But it still expands the possibilities of AI in medicine tenfold, Coraddo says.
As AI enters the healthcare sector, concerns have been raised about the handling of sensitive patient data. The potential risks of AI-generated medical advice are also being discussed. Google introduced the first Med-PaLM at the end of 2022.
A study published at the end of April 2023 showed that even a non-medically fine-tuned version of ChatGPT based on GPT 3.5 can achieve higher ratings for quality and empathy in medical responses than physician responses when rated by humans.