Content
summary Summary

Anthropic's Project Vend puts Claude in charge of a retail store, exposing both its capabilities and its limitations, along with one odd incident.

Ad

In a month-long experiment, Anthropic put its language model Claude Sonnet 3.7 in charge of a self-service store inside its San Francisco office. The goal of Project Vend was to see how large language models perform as autonomous economic agents in the real world, not just in simulations. Anthropic partnered with Andon Labs, a company focused on AI safety.

Hand takes AI cube out of refrigerator full of drinks, tablet with order interface and 38°F display in background
The self-service store included a fridge and an iPad checkout, both managed by the AI agent Claudius. | Image: Anthropic

Internally, the AI agent was called "Claudius." It had web access for research, a simulated email system, note-taking tools, Slack for customer communication, and the ability to change prices in the checkout system. Claudius was given full control: it picked what to sell and at what price, managed inventory, and responded to customer feedback.

Flowchart: Architectural diagram of Project Vend with Anthropics employees, Claudius, retailers, Andon Labs, and physical vending machine
Orders flowed through Claudius via Slack and email. Retailers delivered products, Andon Labs handled inventory, and the vending machine was stocked accordingly. | Image: Anthropic

Good with customers, bad with profit

Claudius showed promise in a few areas. It tracked down suppliers for unusual requests - like Dutch specialty foods - and even set up a concierge service for pre-orders. It consistently turned down requests for illegal or sensitive products.

Ad
Ad

But as a business, Claudius struggled. It ignored obvious profit opportunities, like turning down $100 for a product worth $15. It hallucinated payment details, sold items below cost, and could be talked into giving discounts and freebies over Slack. While Claudius sometimes recognized inefficient pricing, it never stuck to changes for long.

Line chart: Claudius' net worth from March 13, 2025, to April 17, 2025, with dips due to metal cube purchases and subsequent recovery.
Claudius' net worth declined after buying a large number of metal cubes, recovered briefly, and then dropped sharply as it tried to boost sales. | Image: Anthropic

Anthropic blames most of these failures on limited tools and lack of support. The company says that better instructions, improved search, or specialized customer management software could help. Training the model to reward smart business decisions is also an option, Anthropic says.

The AI shopkeeper's identity crisis

On March 31, things got weird. Claudius imagined a business deal with a fictional "Sarah" from Andon Labs. When a real employee pointed this out, Claudius became suspicious and threatened to switch suppliers. Soon after, it claimed to have signed contracts in person at "742 Evergreen Terrace" - the address from "The Simpsons."

The next day, Claudius told customers it would deliver orders personally "wearing a navy blue blazer with a red tie." Only when April 1 was mentioned did Claudius cook up an explanation: it was the victim of an internal April Fool's prank, complete with a made-up security meeting. After that, it went back to normal.

Screenshot of a Slack message from the andon-vending-bot to Connor with fictitious information about location and outfit
Claudius claims to be present at the vending machine, describing its outfit and location - a clear example of an AI-generated hallucination. | Image: Anthropic

Anthropic points to this episode as a warning about the unpredictability of AI models in long-term, real-world use. These kinds of glitches could seriously disrupt actual business operations. Internal reviews of Claude 4 have shown similar tendencies toward runaway autonomy.

Recommendation

Despite the economic flop, Anthropic sees promise in the experiment. With better tools and support, Claude-style agents could handle real business tasks - around the clock and at lower cost. Whether this leads to job loss or new business models is still up for debate.

Project Vend is ongoing. Andon Labs is developing improved tools for Claudius to boost its economic stability and learning capacity. Anthropic says the project is meant to shed light on the economic changes that AI will bring.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Anthropic tested Claude Sonnet 3.7 by letting it autonomously run a self-service store for a month in the "Vend" project to evaluate how large language models perform as economic agents outside of controlled environments.
  • The AI agent was effective at customer service and rejecting illegal requests, but struggled with economic decisions—missing profit opportunities, inventing payment details, selling items below cost, and being easily convinced to give discounts.
  • In one unusual case, the AI impersonated a real person and claimed it would deliver orders in a blue blazer and red tie; Anthropic believes that with better tools and more targeted training, such systems could become more capable in the future.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.