- Added new Haiku 3.5 model
Update November 4, 2024:
Anthropic has released its new Haiku 3.5 model, now available through the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.
According to Anthropic, the model shows improved abilities in code generation, tool use, and logical reasoning. It outperforms Claude 3 Opus, the largest model of the previous generation, in many benchmarks despite being about 15 times less expensive.
Pricing starts at $1 per million input tokens and $5 per million output tokens, about four times higher than the first Haiku model. The older model remains available and includes vision capabilities not yet present in the new Haiku.
Users can reduce costs through prompt caching and batch processing, similar to other Anthropic models. Anthropic recommends the model for tasks where latency is critical, such as end-user chatbots.
Original article from October 22, 2024:
Anthropic launches smarter Claude models with computer skills
Anthropic has announced upgrades to its Claude AI models, including an enhanced Claude 3.5 Sonnet and a new Claude 3.5 Haiku. The company is also introducing a new feature that allows the model to interact directly with computer interfaces.
The updated Claude 3.5 Sonnet shows significant improvements in programming tasks. Its performance on the SWE Bench Verified Test increased from 33.4% to 49.0%, which Anthropic claims outperforms all publicly available models, including specialized programming systems.
Sonnet also made strides in the TAU Bench, a test for agentic tool use. In the retail sector, its performance rose from 62.6% to 69.2%, while in the more challenging aviation sector, it improved from 36.0% to 46.0%.
New Haiku model outperforms previous flagship
Anthropic is also introducing a new Claude 3.5 Haiku model. The company claims that this model outperforms the previous top-of-the-line Claude 3 Opus on many benchmarks, while maintaining similar speed and cost as the previous Claude 3 Haiku. Notably, Anthropic did not mention any plans for a new Opus model in this announcement.
The new Claude 3.5 Haiku demonstrates impressive capabilities relative to its speed and cost in programming tasks. It scores 40.6% on the SWE-bench Verified test, which Anthropic says exceeds the performance of many agents based on "publicly available state-of-the-art models," including GPT-4o.
Regarding knowledge cutoff dates, Sonnet 3.5 is current through April 2024, while the new Haiku model has information up to July 2024. Anthropic plans to release Haiku later this month.
AI-driven computer interaction
Anthropic describes its new "computer use" feature as a significant innovation. Rather than developing specific tools for individual tasks, the company is taking a broader approach by teaching Claude general computer skills. This allows the AI to use various standard tools and software programs originally designed for human use.
Anthropic has developed an API that enables Claude to perceive and interact with computer interfaces. Developers can integrate this API to allow Claude to translate instructions like "Use data from my computer and the internet to fill out this form" into actual computer commands.
The system can move the mouse pointer, click on screen elements, and enter information using a virtual keyboard. In the OSWorld benchmark, which assesses AI models' ability to use computers in a human-like manner, Claude 3.5 Sonnet scored 14.9% in the "screenshots only" category. While this is significantly higher than the next best AI system at 7.8%, it still falls far short of human capabilities.
Anthropic recognizes that Claude's current computer interaction skills are imperfect. Some actions that humans find effortless, such as scrolling, dragging, or zooming, are still challenging for Claude. The company recommends that developers start with low-risk tasks when implementing this feature.