Google DeepMind open-sources AI text watermarking for Gemini

Oct 23, 2024

Google

Key Points

Google DeepMind has expanded its SynthID AI watermarking technology to include text recognition. The company has integrated this feature into its Gemini models and released it as an open-source project.
SynthID for Text works by subtly adjusting token probability scores during text generation. This process creates a watermark pattern without affecting the output's quality or creativity, according to Google Deepmind.
The technology functions across multiple languages, but has limitations with heavily edited text. Google DeepMind has made SynthID available through partnerships with Hugging Face and as part of its Responsible Generative AI Toolkit.

Google DeepMind has added text recognition to its SynthID AI watermarking technology. The company is integrating this feature into its Gemini models and releasing it as an open-source project.

SynthID for Text uses a complex process that intervenes in the text generation of large language models (LLMs). These models generate text token by token, with tokens representing individual characters, words, or parts of sentences.

As an LLM creates a text sequence, it predicts the most likely next token based on previous words and probability scores for potential tokens. SynthID slightly adjusts these probability scores, but only when it won't affect the output's quality, accuracy, or creativity.

Google's SynthID for text manipulates the prediction probabilities for tokens to create an AI text watermark. | Video: Google Depemind

Google DeepMind explains that this process repeats for all generated text. A single sentence could contain ten or more adjusted probability scores, while an entire page might have hundreds. The final pattern of scores - both for the model's word choices and the adjusted probabilities - forms the watermark.

According to Google DeepMind, this technique can be applied to as few as three sentences. For longer texts, the watermark becomes more robust and accurate. While the method works well across languages, it has some weaknesses when it comes to edited AI text.

Gemini integration and open-source release

Google DeepMind has integrated SynthID into the Gemini app and website to watermark and identify generated texts. The technology is also available as an open-source project on GitHub, in the Google Responsible Generative AI Toolkit, and on Hugging Face.

Google DeepMind has published a detailed description of the technology in the scientific journal Nature. The company claims SynthID performs better than existing text watermarking systems. Previously, Google DeepMind introduced SynthID for images, voices, and music.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Nature | Google DeepMind | Github