Content
summary Summary

Cosine, a San Francisco-based AI startup, has unveiled Genie, a new AI model designed to assist software developers. The company claims Genie significantly outperforms competitors in benchmarks.

Ad

Working with OpenAI, Cosine trained a GPT-4o variant using high-quality data, resulting in impressive benchmark results. The company says the key to Genie's performance is its ability to "codify human reasoning," which could have applications beyond software development.

Genie takes the SWE-lead

Cosine co-founder and CEO Alistair Pullen reports that Genie achieved a 30 percent score on the SWE-Bench test, the highest ever for an AI model in this area. This surpasses other coding-focused language models, including those from Amazon (19 percent) and Cognition's Devin (13.8 percent on a portion of SWE-Bench).

Image: Cosine

Genie's architecture aims to replicate the cognitive processes of human developers. It can autonomously and collaboratively fix bugs, develop new software features, restructure code, and perform various programming tasks.

Ad
Ad

Self-improvement through synthetic data

The model was developed using a proprietary process, training and fine-tuning a non-public GPT-40 variant with billions of tokens of high-quality data. Cosine spent nearly a year curating this data with help from experienced developers. The dataset comprises 21 percent each of JavaScript and Python, 14 percent each of TypeScript and TSX, and 3 percent each of other languages from Java to C++ to Ruby.

A key factor in Genie's performance was its self-improving training. Initially, the model learned mostly from perfect, working code, but struggled with its own mistakes.

Cosine addressed this by using synthetic data: if Genie's first proposed solution was incorrect, the model was shown how to improve using the correct result. With each iteration, Genie's solutions improved, requiring fewer corrections.

Overcoming technical limitations

Pullen recognized the potential of large language models to support human software developers in early 2022. However, the technology wasn't yet advanced enough to realize Genie's vision at that time.

The context window, often limited to 4,000 tokens, was a major bottleneck. Today, models like Gemini 1.5 Pro can process up to two million tokens in one prompt. Cosine hasn't disclosed Genie's token capacity.

Recommendation
Image: Cosine

Cosine believes it has an advantage over coding assistants that simply wrap general models like GPT-4 as separate products. "Everyone working on this problem is butting up against the same limit of model intelligence, this is why we chose to train rather than prompt," Pullen explains.

Video: Alistair Pullen/X

Beyond software development

Cosine plans to expand its portfolio with smaller, specialized models and larger, more general ones. The company will also increase its involvement in open-source communities and regularly enhance Genie's capabilities based on customer feedback. Given the size of the dataset, full model training isn't ruled out for the future.

Pullen believes the concept has applications beyond programming: "We truly believe that we’re able to codify human reasoning for any job and industry. Software engineering is just the most intuitive starting point and we can’t wait to show you everything else we’re working on."

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Cosine will offer Genie in two pricing tiers: a roughly $20 option with some feature and usage limitations, and an enterprise-level offering with advanced features and virtually unlimited usage. Currently, interested parties can only join a waiting list.

The Y Combinator spin-off recently secured $2.5 million in seed funding from various venture capital firms to support Genie's development and plans.

Cosine appears to have combined key insights from recent AI developments: mimicking human work can lead to improved outcomes, and the quality of the training data is crucial. AI models like Genie could have a significant impact on the software development industry by increasing productivity. But note that we only have the company's self-reported benchmarks.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • AI startup Cosine has unveiled Genie, a new model trained in collaboration with OpenAI that scored a record 30 percent on the SWE Bench test - significantly higher than competitors such as Amazon or Devin by Cognition.
  • Genie was trained using a proprietary method that fine-tuned a non-public GPT-40 variant with billions of tokens of high-quality data curated by expert developers across multiple programming languages. Self-improving training with synthetic data helped the model deal with its own errors.
  • Cosine plans to expand its portfolio to include specialized and more general models, and to extend Genie's capabilities based on customer feedback. The company believes it can "codify human reasoning" for any profession or industry, and plans to offer Genie to both individuals and organizations.
Jonathan works as a technology journalist who focuses primarily on how easily AI can already be used today and how it can support daily life.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.