MLE-STAR is designed to automate machine learning pipelines with minimal human input

Aug 4, 2025

Sora prompted by THE DECODER

MLE-STAR, a new AI agent from Google Research, combines web search, targeted code refinement, and custom ensemble strategies to automate much of the machine learning process. Early results show significant performance gains with minimal human input.

The system is designed to tackle complex machine learning tasks across different data types, using just a task description and provided data to generate executable Python scripts. According to Google Research, most existing MLE agents rely on standard tools like scikit-learn and aren't very flexible when it comes to exploring alternative models or pipeline components. They also tend to rewrite the entire codebase at once, making it hard to improve specific steps like feature engineering.

Web search instead of trial and error

MLE-STAR takes a multi-step approach. First, the agent uses web search to find up-to-date model ideas, then creates an initial solution. Next, it analyzes which part of the code - whether it's feature engineering, model selection, or building ensembles - has the biggest impact on performance.

The agent then zeroes in on that block, refining it step by step based on feedback from previous experiments. Each iteration uses the improved script from the round before as its starting point.

MLE-STAR also creates multiple solution variants and develops its own ensemble strategy, improving it over time to maximize predictive power. To keep results reliable, MLE-STAR includes three extra modules: a debugging agent that fixes runtime errors, a data leak checker that blocks unauthorized access to test data during training, and a data usage checker that makes sure all available data sources are used - not just basic CSV files.

63 percent medal rate on Kaggle competitions

Google tested MLE-STAR on MLE-Bench-Lite, a benchmark suite based on real Kaggle competitions. The agent earned a medal in 63.6 percent of cases, a big jump from the previous best of 25.8 percent. Of those, 36 percent were gold medals. Google says all it took was a short initial prompt - the system handled the rest on its own.

This performance boost comes in part from using modern models like EfficientNet and ViT, instead of older architectures like ResNet favored by competing systems. MLE-STAR also allows for manual tweaks: for example, the RealMLP model was successfully integrated after its description was added by hand.

Fixing LLM hallucinations

The team found that Gemini 2.5 Flash and Pro sometimes generated unrealistic or faulty code, like using test data for normalization. Here, the data leak checker stepped in. The data usage checker also caught and included ignored datasets during testing.

Google has released MLE-STAR as open source, built on the company's Agent Development Kit. Users are responsible for making sure that any models and web search content they use are properly licensed. For now, MLE-STAR is intended for research use only.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder