Content
summary Summary

Currently, Anthropic seems like the most relevant OpenAI competitor. The startup just released a new chatbot, Claude 2, which is on the same level as ChatGPT, but more cautious.

"I would certainly rather Claude be boring than that Claude be dangerous," says Dario Amodei of Claude's safety restrictions. Amodei was formerly the team lead for AI safety at OpenAI and is now CEO of Anthropic. In the future, a fully capable yet safe chatbot is possible, but it's still an "evolving science," Amodei says.

Amodei is concerned about so-called jailbreaks, specific prompts that cause a model to generate content that it is not supposed to generate according to the developer's specifications - or according to the law. These exploits may currently lead to trivial results, but that could change.

"But if I look at where the scaling curves are going, I’m actually deeply concerned that in two or three years, we’ll get to the point where the models can, I don’t know, do very dangerous things with science, engineering, biology, and then a jailbreak could be life or death," Amodei says.

Ad
Ad

"I think we’re getting better over time at addressing the jailbreaks. But I also think the models are getting more powerful."

The Anthropic CEO sees "maybe a 10 percent chance" that scaling AI systems will fail because there isn't enough data, and the synthetic data is inaccurate. "That would freeze the capabilities at the current level."

If this scaling trend isn't stopped, Amodei expects to see instances of serious AI misuse, such as the mass generation of fake news, in the next two to three years.

AI Safety: Is machine feedback better than human feedback?

Unlike OpenAI and other AI companies, Anthropic relies on fixed rules and AI evaluation rather than human feedback. The AI system is given a set of ethical and moral guidelines, a "constitution," that Anthropic has compiled from various sources, such as laws or corporate policies. A second AI system evaluates whether the first system's generations are following the rules and provides feedback.

Internal testing showed that the safety of this approach was similar in some areas to ChatGPT, which was trained with human feedback (RLHF), and "substantially stronger" in some areas, Amodei said. Overall, Claude's guardrails are stronger, according to Amodei.

Recommendation

For the full interview with Amodei, listen to the New York Times "Hard Fork" podcast. Anthropic's Claude 2 chatbot is currently being rolled out in the US and UK.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • AI startup Anthropic has released the Claude 2 chatbot, which CEO Dario Amodei says rivals ChatGPT but is safer overall.
  • Amodei is concerned about the rise of jailbreaks, and expects to see serious cases of AI abuse in the coming years as AI systems become more powerful.
  • Unlike other AI companies, Anthropic uses fixed rules and AI ratings instead of human feedback, giving the system a "constitution" of ethical and moral guidelines that a second AI system evaluates for compliance.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.