Ad
Skip to content

Google's VaultGemma shows the struggle to balance privacy and performance in AI

Google DeepMind has introduced a new language model called VaultGemma, designed with a focus on privacy. It is the largest open model to date trained from scratch with differential privacy, containing 1 billion parameters.

Normally, large language models can memorize parts of their training data, including sensitive information like names, addresses, or entire documents. Differential privacy avoids this by adding controlled random noise during training, making it statistically impossible to trace the model's outputs back to specific examples. In theory, even if VaultGemma were trained on confidential documents, those documents could not be reconstructed later.

According to Google, early tests confirm that the model does not reproduce training data. The tradeoff is performance: its output is roughly comparable to non-private LLMs released about five years ago.

The model weights are openly available on Hugging Face and Kaggle.

Ad
DEC_D_Incontent-1

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Google