Google updates and expands its open source Gemma AI model family

Google has added new models for code completion and more efficient inference to the Gemma family. Terms of use have been made more flexible.

Google announced today that it is expanding its Gemma family of AI models. Gemma was first released in February and includes lightweight models that use the same technology as Google's larger Gemini models. It's Google's foot in the door in the open-source market.

Gemma for code

There are three new versions of CodeGemma, a model that helps programmers write code:

A pre-trained 7 billion parameter model for completing code and generating new code
A 7 billion parameter model optimized for chatting about code and following instructions
A pre-trained 2 billion parameter model for fast code completion on local devices

Code Gemma does not achieve top scores in benchmarks, but it is very performant without lagging behind. | Image: Google Deepmind

CodeGemma has been trained on 500 billion tokens of data from web documents, math, and code. It can write correct and meaningful code in Python, JavaScript, Java, and other popular programming languages. Google says CodeGemma is meant to let developers write less repetitive code and focus on harder tasks.

Gemma for more efficient inference

Google also released RecurrentGemma, a separate model that uses recurrent neural networks and local attention to be more memory efficient. It performs similarly to the 2 billion parameter Gemma model, but has some benefits:

It uses less memory for longer text generation on devices with limited memory, like single GPUs or CPUs.
It can process text faster by using larger batch sizes and generating more words per second.
It advances AI research by showing how non-transformer models can still perform well.

RecurrentGemma efficiently stores and processes information from earlier steps without slowing down on longer text. In contrast, transformer models like Gemma have to calculate interactions between all parts of the text, which takes more computation and slows down as the text gets longer. | Image: Google Deepmind

Google also updated the original Gemma models to version 1.1 with performance improvements, bug fixes, and more flexible usage terms.

The new models are now available on Kaggle, Nvidia NIM APIs, Hugging Face and in the Vertex AI Model Garden. They work with tools including JAX, PyTorch, Hugging Face Transformers, Gemma.cpp, Keras, NVIDIA NeMo, TensorRT-LLM, Optimum-NVIDIA, and MediaPipe.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Google updates and expands its open source Gemma AI model family

Gemma for code

Gemma for more efficient inference

Microsoft brings GPT-5 to Copilot apps for Windows, Mac and mobile devices

Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

Google updates and expands its open source Gemma AI model family

Gemma for code

Gemma for more efficient inference

Microsoft brings GPT-5 to Copilot apps for Windows, Mac and mobile devices

Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks