Cosine's Genie: The AI Engineer You Wish You Had But Can't Afford to Hire

8/18/24

Editorial team at Bits with Brains

Cosine's Genie is a new state-of-the-art AI model designed for software engineering tasks

Key Takeaways:

Cosine's Genie is a new state-of-the-art AI model designed for software engineering tasks
Genie outperformed previous AI coding models like Devin and Amazon Q on the SWE-Bench benchmark
The model can autonomously fix bugs, build features, refactor code, and collaborate with human developers
Cosine's approach focuses on emulating human reasoning to create an AI that behaves like a skilled colleague

Cosine's Genie: Another Breakthrough in AI Software Engineering

Cosine, a Y Combinator-backed startup based in San Francisco, has unveiled Genie, an advanced AI model that excels at a wide range of software engineering tasks. Genie achieved a remarkable 30% score on the industry-standard SWE-Bench benchmark, significantly surpassing previous top performers like Cognition's Devin (13.8%) and Amazon's Q (19%).

The key to Genie's exceptional performance lies in Cosine's unique approach to developing the AI model. By focusing on teaching the AI to mimic the cognitive processes of human engineers, Genie can tackle coding tasks in a more intuitive and collaborative manner.

Cosine co-founder and CEO Alistair Pullen explains, "My thesis on this is simple: make it watch how a human engineer does their job and mimic that process." This approach allows Genie to behave more like a skilled colleague rather than just a coding assistant.

Versatile Coding Capabilities

Genie is proficient in 15 programming languages, including JavaScript, Python, TypeScript, Java, C++, and more. The model can autonomously fix bugs, build new features, refactor code, and validate its work through comprehensive testing.

Genie can operate independently or in tandem with human developers, providing an experience akin to working with a knowledgeable teammate. Furthermore, Genie integrates seamlessly with tools like Slack and GitHub, enabling it to alert users, ask clarifying questions, and respond to comments on pull requests, just as a human colleague would. This integration streamlines the development workflow and enhances collaboration between the AI and human developers.

Training Methodology and Self-Improvement

In developing Genie, Cosine spent nearly a year curating high-quality training data with the help of experienced developers. The dataset comprises a diverse range of programming languages, with JavaScript and Python each making up 21%, TypeScript and TSX at 14% each, and various other languages like Java, C++, and Ruby at 3% each.

A crucial aspect of Genie's training was its ability to learn from its own mistakes through synthetic data. If Genie's initial proposed solution was incorrect, the model was shown how to improve using the correct result. With each iteration, Genie's solutions improved, requiring fewer corrections. This self-improving training methodology played a significant role in Genie's exceptional performance.

Cosine's Mission and Funding

Founded in 2022, Cosine is trying to push the boundaries of AI by applying human reasoning to complex problems, starting with software engineering. The company recently raised $2.5 million in seed funding from investors such as Uphonest, SOMA Capital, Lakestar, and Focal.

Cosine's co-founders, CEO Alistair Pullen, COO Yang Li, and CIO Sam Stenner, first recognized the potential of large language models to imitate human software developers in early 2022. By codifying human reasoning and using that code to train Genie's underlying language model, they created an AI that is "uncannily human" in its approach to reasoning.

Pullen believes that Cosine's breakthroughs in human reasoning have enabled them to build AI models that can "operate far beyond the narrow range of tasks and tightly restricted prompts currently available to teams developing software." The company's vision extends beyond software engineering, with plans to apply their approach to various jobs and industries in the future.

Implications and Future Plans

By increasing productivity and reducing the time spent on routine tasks, Genie could potentially transform the way engineering resources are allocated, allowing teams to focus on more strategic initiatives.

Cosine has ambitious plans for Genie's future development. The company intends to expand its model portfolio to include smaller models for simpler tasks and larger models capable of handling more complex challenges. Additionally, Cosine plans to extend its work into open-source communities by context-extending one of the leading open-source models and pre-training on a vast dataset.

The company says it remains committed to continuous improvement, with plans to ship regular updates to Genie's capabilities based on customer feedback. While Genie is already being rolled out to select users, broader access is still being managed through a waiting list on the Cosine website.

Pricing and Availability

Cosine plans to offer Genie in two pricing tiers:

An accessible option priced around $20, with some feature and usage limitations, showcasing Genie's capabilities for individuals and small teams.
An enterprise-level offering with expanded features, virtually unlimited usage, and the ability to create a perfect AI colleague who's an expert in every line of code ever written internally. This tier will be priced more substantially, reflecting its value as a full AI engineering colleague.

Currently, interested parties can apply for early access to try Genie on their projects by filling out a web form on the Cosine website.

FAQs

Q: How does Genie compare to other AI coding models?

A: Genie outperformed models like Devin and Amazon Q on the SWE-Bench benchmark, scoring 30% compared to their respective scores of 13.8% and 19%. This suggests that Genie is currently the most advanced AI model for software engineering tasks.

Q: What programming languages does Genie support?

Genie is proficient in 15 languages, including JavaScript, Python, TypeScript, Java, C#, C++, Rust, Scala, Kotlin, Swift, Golang, PHP, and Ruby.

Q: How does Genie ensure the security of generated code?

A: The code generated by Genie is stored directly in the user's GitHub repository, meaning that Cosine does not retain copies or introduce additional security risks.

Q: Is Genie designed to replace human developers?

A: No, Cosine emphasizes that Genie is intended to augment and collaborate with human developers, not replace them entirely. The model behaves like a skilled colleague, working alongside humans to tackle complex software engineering tasks.

Q: What's next for Cosine and Genie?

A: Cosine believes that its approach to codifying human reasoning can be applied to various jobs and industries beyond software development. The company is actively working on expanding its AI capabilities and plans to showcase its progress in the near future.

Sources:

[1] https://venturebeat.com/programming-development/move-over-devin-cosines-genie-takes-the-ai-coding-crown/

[2] https://siliconangle.com/2024/08/12/cosine-raises-2-5m-uncannily-human-ai-coding-assistant-genie/

[3] https://cosine.sh/genie

What Every Senior Decision-Maker Needs to Know About AI and its Impact