summary Summary

Google has added new models for code completion and more efficient inference to the Gemma family. Terms of use have been made more flexible.

Google announced today that it is expanding its Gemma family of AI models. Gemma was first released in February and includes lightweight models that use the same technology as Google's larger Gemini models. It's Google's foot in the door in the open-source market.

Gemma for code

There are three new versions of CodeGemma, a model that helps programmers write code:

  • A pre-trained 7 billion parameter model for completing code and generating new code
  • A 7 billion parameter model optimized for chatting about code and following instructions
  • A pre-trained 2 billion parameter model for fast code completion on local devices
Code Gemma does not achieve top scores in benchmarks, but it is very performant without lagging behind. | Image: Google Deepmind

CodeGemma has been trained on 500 billion tokens of data from web documents, math, and code. It can write correct and meaningful code in Python, JavaScript, Java, and other popular programming languages. Google says CodeGemma is meant to let developers write less repetitive code and focus on harder tasks.


Gemma for more efficient inference

Google also released RecurrentGemma, a separate model that uses recurrent neural networks and local attention to be more memory efficient. It performs similarly to the 2 billion parameter Gemma model, but has some benefits:

  • It uses less memory for longer text generation on devices with limited memory, like single GPUs or CPUs.
  • It can process text faster by using larger batch sizes and generating more words per second.
  • It advances AI research by showing how non-transformer models can still perform well.
RecurrentGemma efficiently stores and processes information from earlier steps without slowing down on longer text. In contrast, transformer models like Gemma have to calculate interactions between all parts of the text, which takes more computation and slows down as the text gets longer. | Image: Google Deepmind

Google also updated the original Gemma models to version 1.1 with performance improvements, bug fixes, and more flexible usage terms.

The new models are now available on Kaggle, Nvidia NIM APIs, Hugging Face and in the Vertex AI Model Garden. They work with tools including JAX, PyTorch, Hugging Face Transformers, Gemma.cpp, Keras, NVIDIA NeMo, TensorRT-LLM, Optimum-NVIDIA, and MediaPipe.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • Google has expanded its Gemma model family with new variants for code completion and more efficient inference. In addition, the standard Gemma models have been updated to version 1.1 with more flexible usage conditions.
  • According to Google, CodeGemma generates syntactically correct and semantically meaningful code in multiple variants to free developers from standard tasks.
  • RecurrentGemma uses recurrent neural networks for lower memory consumption and higher throughput, with similar performance to the basic Gemma 2B model.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.