Content
summary Summary

MLE-STAR, a new AI agent from Google Research, combines web search, targeted code refinement, and custom ensemble strategies to automate much of the machine learning process. Early results show significant performance gains with minimal human input.

Ad

The system is designed to tackle complex machine learning tasks across different data types, using just a task description and provided data to generate executable Python scripts. According to Google Research, most existing MLE agents rely on standard tools like scikit-learn and aren't very flexible when it comes to exploring alternative models or pipeline components. They also tend to rewrite the entire codebase at once, making it hard to improve specific steps like feature engineering.

Web search instead of trial and error

MLE-STAR takes a multi-step approach. First, the agent uses web search to find up-to-date model ideas, then creates an initial solution. Next, it analyzes which part of the code - whether it's feature engineering, model selection, or building ensembles - has the biggest impact on performance.

The agent then zeroes in on that block, refining it step by step based on feedback from previous experiments. Each iteration uses the improved script from the round before as its starting point.

Ad
Ad
Image: Google

MLE-STAR also creates multiple solution variants and develops its own ensemble strategy, improving it over time to maximize predictive power. To keep results reliable, MLE-STAR includes three extra modules: a debugging agent that fixes runtime errors, a data leak checker that blocks unauthorized access to test data during training, and a data usage checker that makes sure all available data sources are used - not just basic CSV files.

63 percent medal rate on Kaggle competitions

Google tested MLE-STAR on MLE-Bench-Lite, a benchmark suite based on real Kaggle competitions. The agent earned a medal in 63.6 percent of cases, a big jump from the previous best of 25.8 percent. Of those, 36 percent were gold medals. Google says all it took was a short initial prompt - the system handled the rest on its own.

This performance boost comes in part from using modern models like EfficientNet and ViT, instead of older architectures like ResNet favored by competing systems. MLE-STAR also allows for manual tweaks: for example, the RealMLP model was successfully integrated after its description was added by hand.

Fixing LLM hallucinations

The team found that Gemini 2.5 Flash and Pro sometimes generated unrealistic or faulty code, like using test data for normalization. Here, the data leak checker stepped in. The data usage checker also caught and included ignored datasets during testing.

Google has released MLE-STAR as open source, built on the company's Agent Development Kit. Users are responsible for making sure that any models and web search content they use are properly licensed. For now, MLE-STAR is intended for research use only.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google Research has introduced MLE-STAR, an AI agent that automates much of the machine learning workflow by combining web search, targeted code improvements, and custom ensemble strategies, producing executable Python scripts from just a task description and dataset.
  • In tests on real Kaggle competitions, MLE-STAR earned medals in 63.6 percent of cases—more than double the prior best—and achieved gold in 36 percent, relying on modern model architectures and incorporating several automated modules for debugging, data leak prevention, and ensuring all available data is used.
  • The system addresses common problems such as faulty code from large language models by adding dedicated checkers, and it is now available as open source for research use.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.