- Added information about Haiku
- Added information about training data
- Added a video about the visual capabilities of Claude 3
Update from March 4, 2024:
Anthropic's smallest and fastest AI model, Haiku, is now available through the Anthropic API. The company says it outperforms competing models from OpenAI (GPT-3.5) and Google (Gemini Pro 1.0) in numerous benchmarks, and is about three times faster for many tasks at a significantly lower cost.
Anthropic cites near real-time customer support and cost-effective analysis of quarterly reports and contracts as sample use cases. According to Anthropic, Claude 3 Haiku can process and analyze 400 Supreme Court cases or 2,500 images for just one US dollar.
Update:
Amazon, which offers all three new Claude 3 models in its Bedrock cloud service, shows a video of Claude 3's visual capabilities in action. For example, it says, "Pharmaceutical companies can query drug research papers alongside protein structure diagrams to speed discovery. Media organizations can automatically generate captions or video scripts." The model shown is the largest, Opus.
Original article from March 4, 2024:
Anthropic introduces Claude 3, its latest large language model, available in three versions. The most powerful version, "Opus", is supposed to be at least on par with GPT-4.
AI startup Anthropic, a spin-off of OpenAI, has introduced the Claude 3 model family, a new series of AI systems designed to set standards in various cognitive task areas.
The family consists of three models: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, offering users a choice of intelligence, speed, and cost for their specific applications. Opus is as fast as Claude 2.1, but is said to be much more "intelligent", while Haiku can respond in near real-time. Sonnet is twice as fast as Claude 2, but with "higher levels of intelligence."
All Claude 3 models offer improved analytics and predictive capabilities, nuanced content creation, code generation, and conversation in non-English languages such as Spanish, Japanese, and French, according to Anthropic. In addition, they can handle a variety of visual formats, including photos, charts, graphs and engineering drawings.
Opus and Sonnet are currently available through claude.ai and the Claude API, with Haiku coming soon. Opus is only available to paying Claude customers, Sonnet is free.
Claude 3 models to outperform respective competitor models
According to the announcement, the Claude 3 models outperform their competitors on common AI benchmarks such as undergraduate-level expertise (MMLU), graduate-level reasoning (GPQA), and basic mathematics (GSM8K). Anthropic claims that Opus can demonstrate "near-human levels of comprehension and fluency on complex tasks."
According to Anthropic, Claude 3 models can follow complex instructions and produce structured output in formats such as JSON, making them suitable for natural language classification and sentiment analysis.
While it may be a success for Anthropic to catch up to GPT-4 in benchmarks and beat it in some, two things should be kept in mind: First, benchmarks are just that. How well the models perform in the real world remains to be seen. Second, GPT-4 has been available for about a year, and still no company has managed to make significant progress - despite all the billions invested.
Claude 3 got eyes
The new Claude models have visual capabilities that allow them to process different image formats such as photos, diagrams and technical drawings. Anthropic says this should be of particular benefit to corporate customers whose knowledge bases are encoded in various formats.
With the Claude 3 models, Anthropic also claims to have made significant progress in reducing unnecessary rejections and improving understanding of prompts. Compared to Claude 2.1, the models are said to double the accuracy of challenging open-ended questions and reduce the number of incorrect answers.
Context window with up to one million tokens
Similar to Google Gemini 1.5, Anthropic significantly expands the context window in Claude. The context window describes the amount of information the AI model can process at once. With Claude 3, inputs of up to one million tokens are possible, although the models are initially released with only 200K. For comparison, the original GPT-4 has only 8K tokens, the latest 128K.
The Needle In A Haystack (NIAH) evaluation, which measures a model's ability to accurately extract information, shows that Claude 3 Opus achieves near-perfect extraction of individual pieces of information from long documents with over 99 percent accuracy.
Google also used the NIAH test as a benchmark to highlight the performance of its context window in Gemini 1.5. But this form of LLM search says little about whether the model understands context and can meaningfully summarize or analyze large texts. Depending on the application, there are more effective ways to search large text data - e.g., "Ctrl + F".
Whether these huge context windows are more than just a cost driver remains to be seen. The risk is that the more content you feed the system, the less likely you are to notice that it has missed something.
Input and output costs for a million tokens are $15 and $75 for the most intelligent model, Opus, $3 and $15 for Sonnet, and $0.25 and $1.25 for the fast and compact Hakiu. OpenAI's latest GPT-4 turbo model with 128K tokens costs $10 for a million input tokens and $30 for a million output tokens. Anthropic's pricing strategy seems confident.
According to Anthropic, the development of Claude 3's "model intelligence" is far from complete, and the company plans to release regular updates in the coming months. The company also plans to offer proprietary services and capabilities to large enterprise customers, such as coding assistance.
In its announcement of Claude 3, Anthropic does not comment on the training data used. Rival OpenAI is involved in several legal battles over training data, including one with the New York Times, which claims that OpenAI trained on the newspaper's copyrighted data without its permission.
The technical report for Claude 3 suggests that Anthropic used synthetic data ("generated internally") in addition to common Internet data, with a cutoff date of August 2023.
"Claude 3 models are trained on a proprietary blend of publicly available information on the Internet as of August 2023, as well as non-public data from third parties, data provided by data labeling services and paid contractors, and data we generate internally."