Content
summary Summary

Japanese company Sakana AI built an AI agent that can tackle complex optimization problems used in industry. In a live competition, their AI went head-to-head with more than 1,000 human programmers.

Ad

Sakana AI's ALE agent landed in 21st place at the 47th AtCoder Heuristic Contest, proving AI systems can hold their own against human experts in demanding programming challenges. AtCoder runs Japanese programming competitions where participants solve complex mathematical problems through code. These "NP-hard" problems don't have known efficient solutions, making them particularly challenging.

The tasks mirror real industrial headaches: planning delivery routes, organizing work shifts, managing factory production, and balancing power grids. Human contestants typically spend weeks refining their solutions.

The work builds on ALE-Bench, what Sakana AI calls the first benchmark for score-based algorithmic programming. The benchmark pulls from 40 tough optimization problems from past AtCoder contests. Unlike traditional tests that just mark answers right or wrong, ALE-Bench demands continuous improvement over extended periods.

Ad
Ad
Two-part graphic: on the left, nine NP-hard AtCoder tasks (routing, scheduling, etc.); on the right, framework with problem, scorer, and visualizer modules, code sandbox, leaderboard, and LLM agent pipeline.
ALE-Bench combines NP-heavy AtCoder heuristic contest tasks with a modular agent framework where language models iteratively optimize solutions through code submissions, test runs and visualizations, competing via leaderboard rankings. | Image: Sakana AI

AI Agent mixes expert knowledge with smart search

The ALE agent runs on Google's Gemini 2.5 Pro and combines two main strategies. First, it bakes expert knowledge about proven solution methods directly into its instructions. This includes techniques like simulated annealing, which tests random changes to solutions and sometimes accepts worse results to escape local dead ends.

ALE agent code comparison Initial vs Final: simple vs PROB moves, score 4.9 million→6.2 million
Simulated annealing helped boost the ALE agent's performance scores. | Image: Sakana AI

Second, the system uses a systematic search algorithm called "best-first search" that always picks the most promising partial solution and develops it further. The agent expands this with a "beam search"-like approach, pursuing 30 different solution paths simultaneously. It also uses a "taboo search" mechanism that remembers previously tested solutions to avoid repeating them.

In testing, the best model (o4-mini-high) hit 1,411 points with sequential improvements. Under identical conditions, GPT-4.1 mini scored 1,016 points, Deepseek-R1 reached 1,150 points, and Gemini 2.5 Pro achieved 1,198 points.

The full ALE agent beat these results with 1,879 points, landing it in the top 6.8 percent. On one specific problem, the agent scored 2,880 points, which would have earned 5th place in the original competition.

Bar chart: ALE Bench results for five AI models; ALE Agent achieves 1879 points (top 6.8%) vs. 1016–1411 points for others.
With 1,879 points, the ALE agent clearly dominates the ALE-Bench leaderboard. | Image: Sakana AI

One major difference between the AI and human contestants shows up in their approach. While humans might test a dozen different solutions during a four-hour competition, Sakana's AI can cycle through around 100 versions in the same timeframe. The ALE agent actually churned out hundreds or thousands of potential solutions - something no human could match.

Recommendation

Sakana AI released ALE-Bench as a Python library with a built-in "code sandbox" for safe testing. The framework works with C++, Python, and Rust, running on standard Amazon cloud infrastructure. The company developed the benchmark with AtCoder Inc. The data from 40 competition problems is available on Hugging Face, and the code is publicly accessible on GitHub.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Sakana AI's ALE-Agent achieved 21st place out of 1,000 human participants in a live AtCoder Heuristic Competition, with the agent based on Google's Gemini 2.5 Pro.
  • The AI solved complex optimization problems in areas such as route planning, work shift distribution and production organization, tackling problems linked to real-world industrial challenges.
  • While humans can try out a maximum of twelve code versions in a four-hour competition, the AI managed around 100 revisions and generated hundreds or thousands of potential solutions.
Sources
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.