Google DeepMind has introduced a new language model called VaultGemma, designed with a focus on privacy. It is the largest open model to date trained from scratch with differential privacy, containing 1 billion parameters.

Ad

Normally, large language models can memorize parts of their training data, including sensitive information like names, addresses, or entire documents. Differential privacy avoids this by adding controlled random noise during training, making it statistically impossible to trace the model's outputs back to specific examples. In theory, even if VaultGemma were trained on confidential documents, those documents could not be reconstructed later.

According to Google, early tests confirm that the model does not reproduce training data. The tradeoff is performance: its output is roughly comparable to non-private LLMs released about five years ago.

The model weights are openly available on Hugging Face and Kaggle.

Ad
Ad
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.