Cognition unveils AI-powered software developer Devin for better programming

Mar 13, 2024

Cognition Labs/Screenshot

The introduction of Devin comes with the promise of long-term support for human programmers in the development of software products. However, the real star of the AI assistant is not its generative capabilities.

US-based AI startup Cognition AI, which specializes in applied AI research, has introduced Devin, an AI software developer that can collaborate with human developers as well as perform tasks independently and submit them for review. According to Cognition, Devin is designed to outperform existing language models many times over on a benchmark for solving software problems.

Devin can handle new, unknown libraries with little source code, program complete applications, find bugs in code bases, and process bug reports and feature requests in open-source repositories. Devin also uses machine learning algorithms to constantly learn, improve its performance, and adapt to new challenges, the company said.

According to Cognition, Devin has long-term planning and decision-making capabilities that enable it to execute complex development projects that require thousands of decisions to be made. Devin is also capable of learning and correcting mistakes over time. Equipped with common development tools such as a shell, code editor, and browser in an isolated computing environment, Devin can actively collaborate with users, report progress in real-time, accept feedback, and collaborate on design decisions as needed.

Significantly better benchmark results than GPT-4

Devin was tested against SWE-bench, a benchmark that asks AI agents to solve real-world GitHub problems in open-source projects such as Django and scikit-learn. While Devin's solution rate of 13.86 percent is not outstanding, it is significantly better than other language models tested on this benchmark - including GPT-4. However, the benchmark does not yet take into account new models such as Claude 3 or GPT-4 Turbo.

Devin is not yet publicly available but has been made available via a waiting list to selected developers who share their experiences on X (formerly Twitter) and elsewhere. Cognition has provided little insight into the technical background, so important questions such as the exact software architecture or AI models used remain unanswered.

Devin may be based on GPT-4 Turbo or Claude 3 and has numerous AI agents working in the background. Such automation already existed in GPT 3.5. However, Cognition seems to have worked out the concept carefully and put a lot of emphasis on a user-friendly interface.

Initial field reports are promising

One of the early testers is computer science student Andrew Kean Gao, who is putting Devin to the test with various realistic tasks. In one of his experiments, Devin developed a working Chrome extension that summarizes the complete code of a GitHub repository in a text file.

In a much more complex task, developing a chess game in which you compete against a language model, Devin made remarkable progress, but at some point hung up. Another task, in which Devin was asked to visualize temperature data over time in Antarctica, was not completed satisfactorily by the AI, but at least a website was published directly on Netlify.

Student Gao concludes that Devin's focus is on UI/UX and not primarily on generative AI. The surrounding infrastructure, not the AI itself, is the star of the product, he says. "They have things built out such as auto deploy to netlify, api key protection, intelligent way to interrupt without interrupting, a good UI that is *tailored to humans* and bridges LLM and human dev, the slider to move backwards in time," he writes.

Big promises with little money

Led by Founders Fund, the startup recently closed a $21 million Series A funding round and can count on the help of people like Patrick and John Collison (co-founders of Stripe), Elad Gil, Sarah Guo, Chris Re (Stanford professor), Eric Glyman (co-founder of Ramp) and many others. The sum seems relatively small compared to startups like Cohere, Mistral, or Perplexity.

"We are an applied Al lab focused on reasoning, and code is just the beginning," Cognition says in its X-bio. By improving AI's ability to reason, Cognition believes it can open up new possibilities in various disciplines and help people around the world turn their ideas into reality.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder