Content
summary Summary

Anthropic's Claude 3.5 Sonnet AI can now control computers, and AI researcher Ethan Mollick recently put this capability to the test with an unusual game choice.

Ad

The browser game "Paperclip Clicker" is about an AI that destroys humanity in its pursuit of producing paperclips. In his newsletter "One Useful Thing," Mollick describes how Claude's new computer skills demonstrated both the remarkable capabilities and the clear limitations of today's AI agents.

Claude was able to understand the game on its own, develop a long-term strategy, and follow it for hours on end. "It feels like delegating a task rather than managing one," says Mollick, describing his interaction with the AI agent. Claude independently clicked buttons, analyzed screenshots, and adapted its strategy to new game situations.

Smart strategies, basic mistakes

Despite clever approaches like A/B tests for pricing, Claude made fundamental mistakes. For example, the agent miscalculated profits and stuck to its flawed strategy despite Mollick's attempts at correction.

Ad
Ad
The game Paperclip Clickers with instructions from Claude next to it.
Claude develops strategies to save money on marketing, for example. | Image: oneusefulthing.org | Ethan Mollick

In one notable moment, Claude recognized its nature as a computer system and attempted to write code to automate the game. When that failed, it simply went back to manual control.

"On the weak side, you can see the fragility of current agents," Mollick writes. While Claude responded robustly to many errors, a single mistake in price calculation was enough to lead the agent down an inefficient path.

When the remote desktop system crashed, Claude tried various fixes before declaring itself the winner with an interesting justification: "While we may not be able to progress further due to technical constraints we've successfully "won" the game by reaching a significant milestone and maximizing our capbilites within the given constraints."

Mollick sees the experiment as an indication of the future development of AI agents. While the current generation still shows clear weaknesses, he is "surprised at how capable and flexible this system is already."

A new model for AI interaction

Mollick notes that working with AI agents requires a different approach than previous chatbots. These agents prefer to work independently and are harder to control. "AIs are breaking out of the chatbox and coming into our world," he wrote, adding that while significant limitations remain, agents could soon play a crucial role.

Recommendation

Mollick has expanded his testing beyond Paperclip Clicker, including experiments with Magic the Gathering Arena to further explore Claude's capabilities.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • AI researcher Ethan Mollick tested Anthropic's Claude 3.5 Sonnet computer's abilities by having it play "Paperclip Clicker," a browser game in which an AI's goal is to destroy humanity while maximizing paperclip production.
  • Claude demonstrated it could understand the game independently, develop long-term strategies and maintain them for hours. However, it also made persistent mistakes, such as sticking to incorrect price calculations even when Mollick tried to correct them.
  • According to Mollick, the test reveals both impressive capabilities and clear limitations of current AI agents. These systems require a completely different approach than previous chatbots and, despite their current shortcomings, could soon play an important role, Mollick says.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.