Content
summary Summary

The introduction of Devin comes with the promise of long-term support for human programmers in the development of software products. However, the real star of the AI assistant is not its generative capabilities.

Ad

US-based AI startup Cognition AI, which specializes in applied AI research, has introduced Devin, an AI software developer that can collaborate with human developers as well as perform tasks independently and submit them for review. According to Cognition, Devin is designed to outperform existing language models many times over on a benchmark for solving software problems.

Devin can handle new, unknown libraries with little source code, program complete applications, find bugs in code bases, and process bug reports and feature requests in open-source repositories. Devin also uses machine learning algorithms to constantly learn, improve its performance, and adapt to new challenges, the company said.

According to Cognition, Devin has long-term planning and decision-making capabilities that enable it to execute complex development projects that require thousands of decisions to be made. Devin is also capable of learning and correcting mistakes over time. Equipped with common development tools such as a shell, code editor, and browser in an isolated computing environment, Devin can actively collaborate with users, report progress in real-time, accept feedback, and collaborate on design decisions as needed.

Ad
Ad

Significantly better benchmark results than GPT-4

Devin was tested against SWE-bench, a benchmark that asks AI agents to solve real-world GitHub problems in open-source projects such as Django and scikit-learn. While Devin's solution rate of 13.86 percent is not outstanding, it is significantly better than other language models tested on this benchmark - including GPT-4. However, the benchmark does not yet take into account new models such as Claude 3 or GPT-4 Turbo.

Image: Cognition Labs

Devin is not yet publicly available but has been made available via a waiting list to selected developers who share their experiences on X (formerly Twitter) and elsewhere. Cognition has provided little insight into the technical background, so important questions such as the exact software architecture or AI models used remain unanswered.

Devin may be based on GPT-4 Turbo or Claude 3 and has numerous AI agents working in the background. Such automation already existed in GPT 3.5. However, Cognition seems to have worked out the concept carefully and put a lot of emphasis on a user-friendly interface.

Initial field reports are promising

One of the early testers is computer science student Andrew Kean Gao, who is putting Devin to the test with various realistic tasks. In one of his experiments, Devin developed a working Chrome extension that summarizes the complete code of a GitHub repository in a text file.

In a much more complex task, developing a chess game in which you compete against a language model, Devin made remarkable progress, but at some point hung up. Another task, in which Devin was asked to visualize temperature data over time in Antarctica, was not completed satisfactorily by the AI, but at least a website was published directly on Netlify.

Recommendation
Image: Screenshot/Andrew Kean Gao/X
Student Gao concludes that Devin's focus is on UI/UX and not primarily on generative AI. The surrounding infrastructure, not the AI itself, is the star of the product, he says. "They have things built out such as auto deploy to netlify, api key protection, intelligent way to interrupt without interrupting, a good UI that is *tailored to humans* and bridges LLM and human dev, the slider to move backwards in time," he writes.

Big promises with little money

Led by Founders Fund, the startup recently closed a $21 million Series A funding round and can count on the help of people like Patrick and John Collison (co-founders of Stripe), Elad Gil, Sarah Guo, Chris Re (Stanford professor), Eric Glyman (co-founder of Ramp) and many others. The sum seems relatively small compared to startups like Cohere, Mistral, or Perplexity.

"We are an applied Al lab focused on reasoning, and code is just the beginning," Cognition says in its X-bio. By improving AI's ability to reason, Cognition believes it can open up new possibilities in various disciplines and help people around the world turn their ideas into reality.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • US AI startup Cognition has unveiled Devin, an AI software developer that can collaborate with human developers and perform tasks independently. Devin can perform complex development projects, learn, and correct errors.
  • In a benchmark test called SWE-bench, where real-world GitHub problems in open-source projects were solved, Devin performed 13.86 percent better than other language models tested.
  • Cognition recently closed a $21 million Series A funding round and is backed by notable names such as Stripe co-founders Patrick and John Collison. Devin is not yet publicly available and has only been made available to select developers.
Sources
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.