Deepmind's new chatbot is "more helpful, correct, and harmless"

Deepmind's latest chatbot is called Sparrow: it's designed to translate only the helpful, correct, and harmless sides of the Internet and human language into dialogue.

With the advent of large-scale language models like GPT-2, a debate began about their social risks, such as generating fake news and hate speech or acting as amplifiers of prejudice.

Google's powerful chatbot Lamda, for example, which made headlines for false insinuations of consciousness, is undergoing intensive internal testing and is only being rolled out in small increments to avoid social irritation. Now, Google's AI sister Deepmind is introducing its own dialog model as a research project.

Deepmind integrates human feedback into the training process

With Sparrow, Deepmind is now introducing a chatbot that is supposed to be particularly "helpful, correct and harmless." It is based on Deepmind's Chinchilla language model, which has relatively few parameters but has been trained with a great deal of data.

Deepmind combines two essential approaches to increase Sparrow's chatbot qualities: Similar to Meta's chatbot Blender 3 or Google Lamda, Sparrow can access the Internet, specifically Google, for research purposes. This should improve the correctness of answers.

In addition, Deepmind relies on human feedback in the training process, similar to OpenAI's GPT-3-based InstructGPT models. OpenAI sees human feedback in the training process as a fundamental part of aligning AI based on human needs.

Sparrow thus combines the external validation mechanisms of Google's Lamda or Meta's Blender 3 with the human feedback approach of OpenAI's InstructGPT.

Targeted rule-breaking for study purposes

Deepmind initially implemented a set of rules in Sparrow, such as that the chatbot cannot make threats or insults and cannot impersonate a person. The rules were created in part based on conversations with experts and existing work on harmful speech.

Testers were then asked to get the chatbot to break these rules. Based on these conversations, Deepmind then trained a rule model that makes a possible rule violation recognizable and thus avoidable.

Recommendation

AI research

So-called reasoning models are more efficient but not more capable than regular LLMs, study finds

Example dialogue with Sparrow, in which the chatbot quotes sources from the Internet and ends up identifying itself as a computer program instead of a person. | Image: Deepmind

"Our goal with Sparrow was to build flexible machinery to enforce rules and norms in dialogue agents, but the particular rules we use are preliminary," Deepmind emphasizes. The development of a better and more complete set of rules requires the input of many experts on numerous topics and a wide range of users and affected groups, Deepmind says.

Sparrow still has room for improvement

In initial tests, Deepmind had testers rate the plausibility of Sparrow's answers and whether evidence researched on the Internet supported the answers. In 78 percent of the cases, the test subjects rated Sparrow's answers to factual questions as plausible.

However, the model was not immune to twisting facts and giving off-topic answers. In addition, Sparrow could be made to break rules in eight percent of the test conversations.

Sparrow can refuse to answer potentially harmful questions. | Image: Deepmind

According to Deepmind, Sparrow is a research model and proof of concept. The goal of its development is to better understand how to train safer and more useful agents. According to Deepmind, this will contribute to the development of safer and more useful general AI (AGI).

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

"In the future, we hope conversations between humans and machines can lead to better judgments of AI behaviour, allowing people to align and improve systems that might be too complex to understand without machine help."

Deepmind's new chatbot is "more helpful, correct, and harmless"

Deepmind integrates human feedback into the training process

Targeted rule-breaking for study purposes

So-called reasoning models are more efficient but not more capable than regular LLMs, study finds

Sparrow still has room for improvement

Cybercriminals are upgrading WormGPT with new AI models to power more advanced attacks

Blackmail becomes go-to strategy for AI models facing shutdown in new Anthropic tests

Microsoft struggled with critical Copilot vulnerability for months

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Deepmind's new chatbot is "more helpful, correct, and harmless"

Deepmind integrates human feedback into the training process

Targeted rule-breaking for study purposes

Sparrow still has room for improvement

Share

Bank details