Content
summary Summary

A new study finds that experienced open-source developers actually work more slowly with AI coding tools, even though they believe they're moving faster.

Ad

Researchers at the METR institute ran a randomized trial in early 2025 to see how advanced AI tools affect the productivity of seasoned open-source developers. On average, developers took 19 percent longer to complete real-world tasks when using AI, even though they thought the opposite was true.

The perception gap: Fast feels slow

The study followed 16 experienced developers as they tackled 246 real tasks from their own complex open-source projects. Before starting, the developers predicted that AI would make them 24 percent faster.

To measure the true impact, each task was randomly assigned to one of two groups: a control group working without generative AI and an experimental group using AI assistants, mainly Cursor Pro with leading models like Claude 3.5 and Claude 3.7 Sonnet.

Ad
Ad
METR used a randomized control trial (RCT)—widely considered the gold standard for measuring cause and effect—to assess the true impact of AI tools on developer productivity. | Image: METR

Developers recorded their screens and logged how long each task took. To account for varying difficulty levels, the researchers used a statistical method that included the developers' own time estimates for each task. This let them isolate how much AI use actually changed working time, regardless of whether a task was easy or hard.

Bar chart with five data points representing the estimated and measured impact of AI on developer processing time. Four green dots on the left show the expected time savings from AI according to business and ML experts and developers before and after the study (-40% to -20%). On the far right, a red dot shows the actual result: +19% longer processing time with AI. The chart shows that all forecasts were too optimistic.
Developers expected AI to speed up their work (green), but the METR study found it actually slowed them down by 19 percent (red). | Image: METR

The key finding: while developers consistently expected to save time, the data showed the opposite. Even after finishing their tasks, they still believed AI had made them 20 percent faster, despite actually taking longer.

Real-world impact requires new measurement methods

METR argues that these results show the need for new ways to measure the real-world effects of generative AI. Popular benchmarks like SWE-Bench or RE-Bench typically focus on isolated, context-free tasks and algorithmic evaluation, which can distort the picture. In contrast, randomized control trials like this one test real tasks in realistic settings, giving a fuller view of how AI helps—or hinders—developers in everyday work.

Grouped bar chart: Comparison of the average time spent on active coding, AI prompting, review, idle, and other activities in AI-allowed (green) vs. AI-prohibited (purple) environments.
With AI tools enabled, developers spent less time actively coding or searching, and more time prompting, reviewing, waiting, and idling. | Image: METR

I asked our AI developer whether the results matched his impressions from his day-to-day work. He thinks they are plausible, especially in the context of mature, complex projects with high-quality requirements and numerous implicit rules, such as in open-source projects. Here, AI tools could cause additional explanation and control effort.

The situation is different for new projects or rapid prototyping, as well as when working with previously unknown frameworks. In such scenarios, AI tools could play to their strengths and actually support developers.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A randomized study from METR found that experienced open-source developers using advanced AI tools actually took 19 percent longer to complete real programming tasks, even though they expected to work faster.
  • The research highlighted that standard benchmarks like SWE-Bench do not accurately show the real-world advantages of generative AI for developers.
  • The authors argue that new evaluation methods are needed because current tests fail to factor in actual workflows, context, and team interactions, which can skew how AI tools are measured.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.