Content
summary Summary
Update
  • Added new Haiku 3.5 model

Update November 4, 2024:

Ad

Anthropic has released its new Haiku 3.5 model, now available through the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.

According to Anthropic, the model shows improved abilities in code generation, tool use, and logical reasoning. It outperforms Claude 3 Opus, the largest model of the previous generation, in many benchmarks despite being about 15 times less expensive.

Pricing starts at $1 per million input tokens and $5 per million output tokens, about four times higher than the first Haiku model. The older model remains available and includes vision capabilities not yet present in the new Haiku.

Ad
Ad

Users can reduce costs through prompt caching and batch processing, similar to other Anthropic models. Anthropic recommends the model for tasks where latency is critical, such as end-user chatbots.

Original article from October 22, 2024:

Anthropic launches smarter Claude models with computer skills

Anthropic has announced upgrades to its Claude AI models, including an enhanced Claude 3.5 Sonnet and a new Claude 3.5 Haiku. The company is also introducing a new feature that allows the model to interact directly with computer interfaces.

The updated Claude 3.5 Sonnet shows significant improvements in programming tasks. Its performance on the SWE Bench Verified Test increased from 33.4% to 49.0%, which Anthropic claims outperforms all publicly available models, including specialized programming systems.

Sonnet also made strides in the TAU Bench, a test for agentic tool use. In the retail sector, its performance rose from 62.6% to 69.2%, while in the more challenging aviation sector, it improved from 36.0% to 46.0%.

Recommendation
Table: Comparison of AI models across various benchmarks. Claude 3.5 Sonnet (New) leads in several categories, including GPQA, MMLU, HumanEval, and AIME 2024.
The new sonnet makes the biggest leaps in reasoning and agentic tool testing. | Image: Anthropic

New Haiku model outperforms previous flagship

Anthropic is also introducing a new Claude 3.5 Haiku model. The company claims that this model outperforms the previous top-of-the-line Claude 3 Opus on many benchmarks, while maintaining similar speed and cost as the previous Claude 3 Haiku. Notably, Anthropic did not mention any plans for a new Opus model in this announcement.

Comparison table: AI model performance in various benchmarks, Claude 3.5 Sonnet (new) leading in several categories.
The new Claude 3.5 Sonnet model shows improved performance, especially in logical reasoning, mathematical problem-solving and programming tasks. On the general language comprehension benchmark MMLU, it is only slightly ahead of the old Sonnet 3.5. | Image: Anthropic

The new Claude 3.5 Haiku demonstrates impressive capabilities relative to its speed and cost in programming tasks. It scores 40.6% on the SWE-bench Verified test, which Anthropic says exceeds the performance of many agents based on "publicly available state-of-the-art models," including GPT-4o.

Regarding knowledge cutoff dates, Sonnet 3.5 is current through April 2024, while the new Haiku model has information up to July 2024. Anthropic plans to release Haiku later this month.

AI-driven computer interaction

Anthropic describes its new "computer use" feature as a significant innovation. Rather than developing specific tools for individual tasks, the company is taking a broader approach by teaching Claude general computer skills. This allows the AI to use various standard tools and software programs originally designed for human use.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Anthropic has developed an API that enables Claude to perceive and interact with computer interfaces. Developers can integrate this API to allow Claude to translate instructions like "Use data from my computer and the internet to fill out this form" into actual computer commands.

The system can move the mouse pointer, click on screen elements, and enter information using a virtual keyboard. In the OSWorld benchmark, which assesses AI models' ability to use computers in a human-like manner, Claude 3.5 Sonnet scored 14.9% in the "screenshots only" category. While this is significantly higher than the next best AI system at 7.8%, it still falls far short of human capabilities.

Anthropic recognizes that Claude's current computer interaction skills are imperfect. Some actions that humans find effortless, such as scrolling, dragging, or zooming, are still challenging for Claude. The company recommends that developers start with low-risk tasks when implementing this feature.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Anthropic presents improved versions of its AI models Claude 3.5 Sonnet and Claude 3.5 Haiku. Both models achieved significant performance improvements, particularly in programming tasks.
  • The new Claude 3.5 Haiku model is expected to outperform the previous top-of-the-line Claude 3 Opus model in many intelligence benchmarks, at the same cost and similar speed as its predecessor, Claude 3 Haiku.
  • Anthropic introduces a new feature for AI-driven computing. An API enables Claude to perceive computer surfaces, interact with them and translate instructions into concrete computer commands. However, the system is still a long way from human capabilities.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.