In the midst of OpenAI's existential crisis, competitor Anthropic unveils its new language model and chatbot, Claude 2.1. It has a context window twice the size of its predecessor and is said to make fewer mistakes.
With a 200K context window, Anthropic's Claude 2.1 surpasses the large 100K context window of its predecessor, which itself was surpassed by GPT-4 Turbo with a 128K context window in early November. With the 200K context window, Anthropic is once again the provider with the most attentive AI model on the market.
The context window describes how much content the language model can look at simultaneously when generating an answer. In the case of Claude 2, this is approximately 150,000 words or more than 500 pages of material, according to Anthropic.
Chatting the Iliad
Users can upload entire code bases, financial reports, or even large literary works like the Iliad or the Odyssey for the model to process, Anthropic says.
Claude can perform tasks such as summarizing, question-and-answer, predicting trends, and comparing multiple documents. However, generating an answer can take several minutes - nothing compared to hours of human work, Anthropic points out.
In practice, however, the benefits of these large context windows are still limited. Tests show that large language models retrieve content less reliably when that content is further back and more in the middle of the input, the so-called "lost in the middle" phenomenon. The larger the input, the greater the risk of error.
In practice, this means that you can input large documents, but parts of the document may not be included in the analysis. The model finds information most reliably at the beginning of documents, as the GPT-4 Turbo benchmarks show.
Independent benchmarks will show how good or bad Claude 2.1 is here. In any case, Anthropic promises significant improvements over its predecessor, especially for longer contexts.
The model shows a 30 percent reduction in incorrect answers and a "3-4x lower rate of mistakenly concluding a document supports a particular claim."
When the model is uncertain, it discards nearly twice as many answers and admits uncertainty ("I'm not sure what the fifth largest city in Bolivia is") as its predecessor.
Claude 2.1 should be more honest and understand things better
According to Anthropic, Claude 2.1 has reduced the hallucination rate by a factor of two compared to its predecessor, Claude 2.0. As a result, organizations can build AI applications with greater confidence and reliability.
With the new model, Anthropic is also introducing a beta feature called Tool Usage, which enables Claude to integrate with users' existing processes, products, and APIs. Claude can now orchestrate developer-defined functions or APIs, search web sources, and retrieve information from private knowledge bases.
The developer console has been simplified for Claude API users to make it easier to test new calls and speed up the learning curve. The new Workbench allows developers to work on prompts in a playful environment and access new model settings to tweak Claude's behavior.
Claude 2.1 is now available via API and supported by the chat interface on claude.ai for free and pro plans. The 200K token context window is reserved for Claude Pro users. Claude is currently available in 95 countries, but not in the EU.
Today's launch of Claude 2.1 could be a strategic move: Anthropic's competitor OpenAI is in deep crisis, and the heavily criticized OpenAI board is said to have even approached Anthropic's CEO about a merger. In addition, more than 100 OpenAI customers are said to have inquired about Anthropic's offerings.