New S* framework helps AI models write better, more reliable code

Midjourney prompted by THE DECODER

The new S* framework enables AI language models to generate more powerful and reliable code.

Researchers at the University of California, Berkeley have created a framework called S* that improves how AI language models generate code. The system combines two different approaches - parallel and sequential scaling - with a new way of selecting the best results.

While generating multiple code snippets at once and picking the best one (parallel scaling) isn't new, the Berkeley team added something extra. They combined it with sequential scaling, where the system continuously improves its solutions through systematic debugging.

The framework introduces a variation of test-time compute as one of its building blocks. Unlike current reasoning models like OpenAI o1, S* incorporates external feedback rather than relying solely on internal reasoning chains. This design makes it compatible with both traditional large language models (LLMs) and newer reasoning models (LRMs).

Using AI to evaluate code solutions

The second key innovation is what the team calls "adaptive input synthesis." In testing, they used GPT-4o mini to generate test inputs for different potential solutions. By running these inputs and analyzing the actual results, the AI can reliably identify the best solution.

The system asks the AI model to create test inputs specifically designed to spot differences between two programs. It uses carefully crafted prompts that tell the model to consider edge cases (like empty inputs or extreme values), generate complex but manageable test cases, and create inputs that could reveal potential errors.

The system then runs both programs using these test inputs and shows the results back to the AI model, which decides which solution works better based on real test outcomes.

S* framework significantly improves performance of small models

The team tested S* with 12 different language models of varying sizes and types, finding consistent improvements across the board: Qwen2.5-7B-Coder-Instruct with S* performed about 10% better than Qwen2.5-32B-Coder-Instruct without it and in some cases, smaller models using S* outperformed larger reasoning models - GPT-4o mini with S* beat o1-Preview. Even powerful reasoning models showed improvement when using the framework.

The framework does have some clear constraints. It's currently optimized only for programming competition tasks and hasn't been tested on more complex software engineering challenges. The team also focused exclusively on improving accuracy, setting aside questions of resource efficiency.

Recommendation

AI research

Apple's local AI agent framework paves the way for more useful Apple Intelligence

The approach of combining iterative improvements with search capabilities likely contributed to OpenAI's success in the ARC benchmark, where they made multiple parallel queries to their o3 reasoning model and selected the best answers - though the exact method remains unknown. S* follows a similar philosophy and could lead to better code generation capabilities in the future.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

New S* framework helps AI models write better, more reliable code

Using AI to evaluate code solutions

S* framework significantly improves performance of small models

Apple's local AI agent framework paves the way for more useful Apple Intelligence

Cursor developer Anysphere closes mega financing round

Slopsquatting: One in five AI code snippets contains fake libraries

LTM-2-mini sets new record for AI context processing, handling 10 million lines of code

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New S* framework helps AI models write better, more reliable code

Using AI to evaluate code solutions

S* framework significantly improves performance of small models

Share

Bank details