AI coding can make developers slower even if they feel faster

Update

Added perspective from study participant Quentin Anthony

Update from July 13, 2025:

Quentin Anthony, one of the 16 developers involved in the METR study, shared his take on the results on X. Unlike most participants, Anthony managed to cut his task completion time by 38 percent with AI—making him a notable outlier in the overall findings.

Quentin Anthony was one of four developers who actually sped up with AI assistance. | Image: METR

Anthony attributes the wider productivity drop not to a lack of skill, but to the way developers approach AI tools. "We like to say that LLMs are tools, but treat them more like a magic bullet," he says. Because these models often feel tantalizingly close to a solution, developers end up spending too much time with them rather than moving efficiently toward their goal.

He also highlights technical boundaries. Large language models, he argues, have uneven strengths: they're effective for writing test code but fall short on low-level system work such as GPU kernel programming or synchronization logic. Anthony points to the phenomenon of "context rot", where LLMs become less reliable as chats grow longer or drift off-topic—an issue that can steer developers into unproductive loops.

His advice: start new chats frequently, choose models based on their specific strengths, and set firm limits on how much time to spend interacting with LLMs.

Anthony himself uses different models for different jobs—Gemini for code comprehension, Claude for refactoring and debugging. He prefers direct API access over IDE plugins to better control exactly what the model sees. As he puts it, "LLMs are a tool, and we need to start learning its pitfalls and have some self-awareness."

Original article from July 11, 2025:

A new study finds that experienced open-source developers actually work more slowly with AI coding tools, even though they believe they're moving faster.

Researchers at the METR institute ran a randomized trial in early 2025 to see how advanced AI tools affect the productivity of seasoned open-source developers. On average, developers took 19 percent longer to complete real-world tasks when using AI, even though they thought the opposite was true.

Recommendation

AI research

OpenAI’s math breakthrough might also mean AI is getting better at knowing its own limits

The perception gap: Fast feels slow

The study followed 16 experienced developers as they tackled 246 real tasks from their own complex open-source projects. Before starting, the developers predicted that AI would make them 24 percent faster.

To measure the true impact, each task was randomly assigned to one of two groups: a control group working without generative AI and an experimental group using AI assistants, mainly Cursor Pro with leading models like Claude 3.5 and Claude 3.7 Sonnet.

METR used a randomized control trial (RCT)—widely considered the gold standard for measuring cause and effect—to assess the true impact of AI tools on developer productivity. | Image: METR

Developers recorded their screens and logged how long each task took. To account for varying difficulty levels, the researchers used a statistical method that included the developers' own time estimates for each task. This let them isolate how much AI use actually changed working time, regardless of whether a task was easy or hard.

Bar chart with five data points representing the estimated and measured impact of AI on developer processing time. Four green dots on the left show the expected time savings from AI according to business and ML experts and developers before and after the study (-40% to -20%). On the far right, a red dot shows the actual result: +19% longer processing time with AI. The chart shows that all forecasts were too optimistic. — Developers expected AI to speed up their work (green), but the METR study found it actually slowed them down by 19 percent (red). | Image: METR

The key finding: while developers consistently expected to save time, the data showed the opposite. Even after finishing their tasks, they still believed AI had made them 20 percent faster, despite actually taking longer.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Real-world impact requires new measurement methods

METR argues that these results show the need for new ways to measure the real-world effects of generative AI. Popular benchmarks like SWE-Bench or RE-Bench typically focus on isolated, context-free tasks and algorithmic evaluation, which can distort the picture. In contrast, randomized control trials like this one test real tasks in realistic settings, giving a fuller view of how AI helps—or hinders—developers in everyday work.

Grouped bar chart: Comparison of the average time spent on active coding, AI prompting, review, idle, and other activities in AI-allowed (green) vs. AI-prohibited (purple) environments. — With AI tools enabled, developers spent less time actively coding or searching, and more time prompting, reviewing, waiting, and idling. | Image: METR

I asked our AI developer whether the results matched his impressions from his day-to-day work. He thinks they are plausible, especially in the context of mature, complex projects with high-quality requirements and numerous implicit rules, such as in open-source projects. Here, AI tools could cause additional explanation and control effort.

The situation is different for new projects or rapid prototyping, as well as when working with previously unknown frameworks. In such scenarios, AI tools could play to their strengths and actually support developers.

AI coding can make developers slower even if they feel faster

OpenAI’s math breakthrough might also mean AI is getting better at knowing its own limits

The perception gap: Fast feels slow

Real-world impact requires new measurement methods

Google Deepmind's "Vibe Checker" aims to rate AI code by human standards

Googles CodeMender is designed to automatically find and fix security flaws in software

Google adds command line and API access to its coding agent Jules

The long-predicted deepfake dystopia has arrived with Sora 2

Anthropic claims to lower the entry barrier for advanced AI models with Claude Haiku 4.5

OpenAI says GPT-5 shows 30 percent less political bias than previous models

AI coding can make developers slower even if they feel faster

The perception gap: Fast feels slow

Real-world impact requires new measurement methods

Share

Bank details