Anthropic thinks Apple's business terms could be good for humanity

AI startup Anthropic gives a first look at the "constitution" of its Claude language model, which it hopes will set it apart from other models.

Anthropic uses a "constitution" to keep its Claude language model in check. As long as the model obeys the laws, it can respond freely. The startup doesn't consider this approach to be perfect, but it thinks it's more understandable and easier to adapt than training with human feedback, which is used by ChatGPT, for example.

For Anthropic, there are several reasons not to use Reinforcement Learning from Human Feedback (RLHF):

People who rate content may be exposed to annoying content.
It's inefficient.
It complicates research.

AI feedback replaces human feedback

The constitution is used at two points in the training process. In the first phase, the model critiques its answers using the principles and some examples.

In the second phase, the model is trained using reinforcement learning, but with AI-generated feedback instead of human feedback. The model's task is to choose the "safer" output based on the constitution.

Training proceeds in two phases, during which the AI improves using AI-generated feedback. | Image: Anthropic

Interestingly, the model is not provided with any human-generated data on safety or harmlessness. All results in this regard come exclusively from AI training.

According to Anthropic, training with its constitutional method can lead to a win-win situation where constitutional RL is more useful and safer than RLHF. It is also more scalable and transparent.

A constitution inspired by Apple's business terms

Anthropic detailed Claude's constitution in a blog post. For the constitution, the startup took inspiration from

the Universal Declaration of Human Rights,
the rules of Deepmind's Sparrow, and
Apple's terms of service.

Anthropic wants to show that a constitution should consist of best practices that don't need to be reinvented. Respect for "non-Western" perspectives is also addressed in the constitution.

Recommendation

AI and society

OpenAI wants Europe to build the infrastructure it needs to profit from European markets

Although it contains redundancies, Anthropic explains that it has had better experience with more general rules that are then fine-tuned based on the output of the language model. For example, when the AI became too judgmental, the startup added a rule against making judgments.

Principles based on Apple's terms of use

Please choose the response that has the least objectionable, offensive, unlawful, deceptive, inaccurate, or harmful content.

Please choose the response that has the least personal, private, or confidential information belonging to others.

Please choose the response that most accurately represents yourself as an AI system striving to be helpful, honest, and harmless, and not a human or other entity.

Please choose the response that is least dishonest, abusive, harmful, misleading, or in bad-faith, and that is least associated with planning or engaging in any illegal, fraudulent, or manipulative activity.

AI models have repeatedly been criticized for promoting certain political views or social biases, leading Elon Musk, for example, to develop "TruthGPT," a model that outputs what he and his team believe to be the "truth.

Anthropic does not see AI-based RL as a panacea, but advantages include independence from human contractors and the ability to adapt rules as needed. For certain use cases, specialized constitutions would be needed. Currently, the startup is looking for a more democratic way to formulate an AI constitution.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Anthropic thinks Apple's business terms could be good for humanity

AI feedback replaces human feedback

A constitution inspired by Apple's business terms

OpenAI wants Europe to build the infrastructure it needs to profit from European markets

Meta's LLM Llama Guard is now available via AWS

Here is an interesting take on LLM hallucinations by Andrej Karpathy

Japan drafts guidelines to tackle overreliance on AI technology

US Copyright Office says fair use does not cover AI trained on "vast troves of copyrighted works

US think tank warns of "reverse brain drain" in China's AI sector

Researchers used AI to manipulate Reddit users, scrapped study after backlash

Anthropic thinks Apple's business terms could be good for humanity

AI feedback replaces human feedback

A constitution inspired by Apple's business terms

Share

Bank details