Google's VaultGemma shows the struggle to balance privacy and performance in AI

Google DeepMind has introduced a new language model called VaultGemma, designed with a focus on privacy. It is the largest open model to date trained from scratch with differential privacy, containing 1 billion parameters.

Normally, large language models can memorize parts of their training data, including sensitive information like names, addresses, or entire documents. Differential privacy avoids this by adding controlled random noise during training, making it statistically impossible to trace the model's outputs back to specific examples. In theory, even if VaultGemma were trained on confidential documents, those documents could not be reconstructed later.

According to Google, early tests confirm that the model does not reproduce training data. The tradeoff is performance: its output is roughly comparable to non-private LLMs released about five years ago.

The model weights are openly available on Hugging Face and Kaggle.

Google's VaultGemma shows the struggle to balance privacy and performance in AI

DeepEyesV2 outperforms bigger rivals by favoring tools over sheer knowledge

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

Human-aligned AI models prove more robust and reliable

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

German court deepens the split on AI and copyright with its latest ruling

OpenAI and Microsoft call AGI pointless, then make it the linchpin of billion-dollar deals

Google's VaultGemma shows the struggle to balance privacy and performance in AI

DeepEyesV2 outperforms bigger rivals by favoring tools over sheer knowledge

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

Human-aligned AI models prove more robust and reliable