Anthropic claims context engineering beats prompt engineering when managing AI agents

Anthropic is looking to move beyond prompt engineering with a new approach it calls "context engineering." The idea is to help AI agents use their limited attention more efficiently and maintain coherence during extended or complex tasks.

Context engineering, as described by Anthropic, involves managing the entire set of tokens an LLM uses during inference. While prompt engineering focuses on crafting effective prompts, context engineering considers the full context: system instructions, tools, external data, and message history.

Vergleichsdiagramm mit zwei Spalten: Links zeigt Prompt Engineering mit einfachem Kontextfenster aus System Prompt und User Message. Rechts zeigt Context Engineering mit komplexerem Setup inklusive Dokumenten, Tools, Memory Files und Message History, wobei eine Kuratierungsphase die relevanten Elemente für das finale Kontextfenster auswählt. — Classic prompt engineering for individual queries compared to context engineering, which lets agents curate context continuously. | Image: Anthropic

The term "context engineering" isn't entirely new. Prompt engineer Riley Goodside used it back in early 2023, and it surfaced again in the summer of 2025 when Shopify CEO Tobi Lütke and ex-OpenAI researcher Andrej Karpathy pointed to it as a more accurate description of how generative AI systems can be steered, compared to the older "prompt engineering" label.

Strategies for building context

Anthropic advises tuning system prompts to be specific enough to guide behavior but flexible enough to allow for broad heuristics. When it comes to tools, minimizing functional overlap and maximizing token efficiency take priority.

A noticeable trend is the move toward "just in time" data strategies. Rather than preloading all information, agents store lightweight identifiers and fetch data only when needed. Anthropic's coding tool Claude Code, for example, analyzes complex data by loading only what it needs, keeping the context window lean.

Diagramm mit Farbskala von rot über grün zu rot, das drei Beispiele für System Prompts zeigt: links ein zu spezifischer Prompt mit detaillierten Schritt-für-Schritt-Anweisungen, mittig ein ausgewogener Prompt mit klaren aber flexiblen Richtlinien, rechts ein zu vager Prompt mit allgemeinen Aussagen. — Anthropic's prompt calibration guide outlines three approaches: overly specific if-else rules, a balanced middle ground, and vague, generic instructions. | Image: Anthropic

For longer tasks, Anthropic has identified three main tactics:

Compacting: Summarizing conversations near the context window limit and restarting with a compressed summary.
Structured notes: Saving persistent information outside the context window.
Sub-agent architectures: Assigning specialized agents to focused tasks, with the main agent only receiving condensed summaries.

Attention as a bottleneck

These strategies aim to work around the limitations of LLMs. As context windows get bigger, models often face "context rot"—the more tokens, the harder it is for them to retrieve the right information.

This problem is baked into the transformer architecture. Every token relates to every other token, meaning the number of relationships grows as n² for n tokens. With a limited "attention budget," LLMs can quickly get overwhelmed as context grows.

Managing memory and tokens

Anthropic's Claude 4.5 Sonnet rollout included a new memory tool, now in public beta. This lets agents build persistent knowledge bases, with developers deciding where and how data gets stored. Claude can create, read, and edit files in a memory directory that carries over between conversations.

Recommendation

AI in practice

OpenAI launches GPT-4.1: New model family to improve agents, long contexts and coding

Anthropic claims notable gains from these features. In internal tests, combining the Memory Tool with Context Editing improved agent-based search performance by 39 percent; context editing alone brought a 29 percent bump. In a 100-round web search, token consumption reportedly dropped by 84 percent.

The new tools are available in public beta on the Claude Developer Platform, including integrations with Amazon Bedrock and Google Cloud Vertex AI. Anthropic also provides step-by-step documentation and a cookbook for developers.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Anthropic claims context engineering beats prompt engineering when managing AI agents

Strategies for building context

Attention as a bottleneck

Managing memory and tokens

OpenAI launches GPT-4.1: New model family to improve agents, long contexts and coding

Claude Sonnet 4.5 is designed to tackle coding tasks for over 30 hours at a time, Anthropic says

Anthropic settles landmark AI copyright lawsuit for at least $1.5 billion

Anthropic explains recent Claude quality drop: three technical failures to blame

OpenAI unveils Sora 2 video model with realistic physics, high-quality audio, and a new social app

Deepmind says video models for visual tasks could become what LLMs are for text tasks

Sam Altman says scaling up compute is the "literal key" to OpenAI's revenue growth

Anthropic claims context engineering beats prompt engineering when managing AI agents

Strategies for building context

Attention as a bottleneck

Managing memory and tokens

Share

Bank details