The introduction of Devin comes with the promise of long-term support for human programmers in the development of software products. However, the real star of the AI assistant is not its generative capabilities.
US-based AI startup Cognition AI, which specializes in applied AI research, has introduced Devin, an AI software developer that can collaborate with human developers as well as perform tasks independently and submit them for review. According to Cognition, Devin is designed to outperform existing language models many times over on a benchmark for solving software problems.
Devin can handle new, unknown libraries with little source code, program complete applications, find bugs in code bases, and process bug reports and feature requests in open-source repositories. Devin also uses machine learning algorithms to constantly learn, improve its performance, and adapt to new challenges, the company said.
According to Cognition, Devin has long-term planning and decision-making capabilities that enable it to execute complex development projects that require thousands of decisions to be made. Devin is also capable of learning and correcting mistakes over time. Equipped with common development tools such as a shell, code editor, and browser in an isolated computing environment, Devin can actively collaborate with users, report progress in real-time, accept feedback, and collaborate on design decisions as needed.
Significantly better benchmark results than GPT-4
Devin was tested against SWE-bench, a benchmark that asks AI agents to solve real-world GitHub problems in open-source projects such as Django and scikit-learn. While Devin's solution rate of 13.86 percent is not outstanding, it is significantly better than other language models tested on this benchmark - including GPT-4. However, the benchmark does not yet take into account new models such as Claude 3 or GPT-4 Turbo.
Devin is not yet publicly available but has been made available via a waiting list to selected developers who share their experiences on X (formerly Twitter) and elsewhere. Cognition has provided little insight into the technical background, so important questions such as the exact software architecture or AI models used remain unanswered.
Devin may be based on GPT-4 Turbo or Claude 3 and has numerous AI agents working in the background. Such automation already existed in GPT 3.5. However, Cognition seems to have worked out the concept carefully and put a lot of emphasis on a user-friendly interface.
Initial field reports are promising
One of the early testers is computer science student Andrew Kean Gao, who is putting Devin to the test with various realistic tasks. In one of his experiments, Devin developed a working Chrome extension that summarizes the complete code of a GitHub repository in a text file.
In a much more complex task, developing a chess game in which you compete against a language model, Devin made remarkable progress, but at some point hung up. Another task, in which Devin was asked to visualize temperature data over time in Antarctica, was not completed satisfactorily by the AI, but at least a website was published directly on Netlify.
Big promises with little money
Led by Founders Fund, the startup recently closed a $21 million Series A funding round and can count on the help of people like Patrick and John Collison (co-founders of Stripe), Elad Gil, Sarah Guo, Chris Re (Stanford professor), Eric Glyman (co-founder of Ramp) and many others. The sum seems relatively small compared to startups like Cohere, Mistral, or Perplexity.
"We are an applied Al lab focused on reasoning, and code is just the beginning," Cognition says in its X-bio. By improving AI's ability to reason, Cognition believes it can open up new possibilities in various disciplines and help people around the world turn their ideas into reality.