Content
summary Summary

Google Research and DeepMind have released MedGemma, an open-source collection of AI models built specifically for medical use.

Ad

The MedGemma family includes a 4B model that can handle text, images, or both, along with a larger 27B version in text-only and multimodal formats. Google introduced the collection at this year's I/O conference.

MedGemma is designed for a variety of medical fields, including radiology, dermatology, histopathology, and ophthalmology. According to Google, the models can serve as a foundation for new healthcare AI tools and work either independently or within agent-based systems.

Collage: Chest X-ray showing left-sided pneumothorax with drainage and consolidation; pearly papule; red papules.
MedGemma analyzes X-rays and skin photos to generate diagnostic suggestions for different medical fields. | Image: Google

MedGemma delivers gains over standard models

The technical report indicates that MedGemma delivers substantial improvements compared to similarly sized foundation models. On specialized medical tasks, the models deliver up to 10 percent higher accuracy in multimodal Q&A, 15.5 to 18.1 percent better results for X-ray classification, and a 10.8 percent boost in complex, agent-based evaluations.

Ad
Ad

Benchmark scores bear this out. On MedQA, which tests medical exam questions, the 4B model reaches 64.4 percent accuracy, compared to 50.7 percent for the baseline. The 27B version scores 87.7 percent, up from 74.9 percent.

Accuracy scores of small and large LLMs on text-based medical benchmarks (MedQA, MedMCQA, PubMedQA) and MMLU subdomains.
MedGemma consistently outperforms its base model on medical benchmarks. | Image: Google

MedGemma also outperforms its base model on medical benchmarks. In testing on the MIMIC-CXR dataset of X-ray images and reports, the 4B version posted a macro F1 score of 88.9, compared to 81.2 for the original Gemma 3 4B model. The F1 score tracks accuracy across different medical conditions.

MedSigLIP: Specialized image encoder

For image processing, Google is introducing MedSigLIP, a medical image encoder with 400 million parameters. MedSigLIP builds on SigLIP ("Sigmoid Loss for Language Image Pre-training"), a system designed to link images with text. The medical version extends this, allowing MedGemma to interpret medical images more effectively.

Schema: MedSigLIP encodes medical images for MedGemma 4B, while MedGemma 27B processes medical text.
MedSigLIP encodes medical image data while MedGemma 27B handles clinical text, creating a multimodal AI system for healthcare. | Image: Google

MedSigLIP processes medical images as MedGemma 27B interprets clinical text, making them a powerful multimodal system for healthcare. The encoder works at a 448 x 448 pixel resolution, which is more efficient than the higher-res 896 x 896 variant used in MedGemma.

The model was trained on over 33 million image-text pairs, including 635,000 examples from different medical domains and 32.6 million histopathology patches. To preserve SigLIP’s general image recognition, the original dataset was kept and medical data made up 2 percent of the total, letting the encoder handle both general and medical content.

Recommendation

Fine-tuning for real-world medical tasks

Researchers showed how MedGemma can be fine-tuned for specific medical tasks. For automated X-ray report generation, the RadGraph F1 score improved from 29.5 to 30.3, indicating better capture of essential clinical information. For pneumothorax (collapsed lung) detection, accuracy jumped from 59.7 to 71.5. In histopathology, the weighted F1 score for tissue classification rose from 32.8 to 94.5.

Video: Google

A major leap came in electronic health record analysis: reinforcement learning cut the error rate in data retrieval by half, promising new efficiencies in handling patient data.

MedGemma is available on Hugging Face. The license permits research, development, and general AI uses, but not direct medical diagnosis or treatment without regulatory approval. Commercial use is allowed as long as the restrictions are followed.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Benchmarks aren't the same as the real world

Last year, Google launched a medical AI model built on its closed Gemini platform. MedGemma’s open-source foundation and support for customization could encourage broader adoption.

Still, strong benchmark results don’t always translate into clinical practice. One study found that real-world effectiveness can be limited by misunderstandings or incorrect user interactions, highlighting the gap between test scores and practical results.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google Research and Deepmind have released MedGemma, an open-source suite of AI models designed for medical fields like radiology, dermatology, histopathology, and ophthalmology.
  • The technical report finds that MedGemma models perform much better than standard models on a range of medical tasks.
  • Fine-tuning results show strong adaptability for specialized uses, including a big improvement in tissue classification accuracy (F1 score jumps from 32.8 to 94.5) and a 50% reduction in error rate for analyzing electronic patient records.
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.