MLE-STAR: Google's AI Agent Automates ML Pipelines with Minimal Input

Google Research has introduced MLE-STAR, a new AI agent designed to automate the intricate process of building machine learning (ML) pipelines with minimal human intervention. This system aims to streamline complex ML tasks across various data types, capable of generating executable Python scripts from just a task description and provided data.

Traditional ML automation agents often rely on a limited set of standard tools and tend to lack flexibility when exploring diverse models or pipeline components. They typically rewrite an entire codebase at once, which complicates the targeted improvement of specific steps like feature engineering. MLE-STAR addresses these limitations with a multi-step, iterative approach.

The agent begins by leveraging web search to discover contemporary model ideas, using this information to construct an initial solution. It then meticulously analyzes the codebase to identify which segment—be it feature engineering, model selection, or ensemble construction—has the most significant impact on overall performance. With this insight, MLE-STAR focuses its efforts on refining that specific code block step-by-step, continually incorporating feedback from previous experiments and using the improved script as a starting point for the next iteration.

Beyond its core refinement process, MLE-STAR includes several modules to ensure robust and reliable results. It can generate multiple solution variants and develop its own ensemble strategies, iteratively enhancing them for maximum predictive power. To prevent common pitfalls, the system integrates a debugging agent to fix runtime errors, a data leak checker to prevent unauthorized access to test data during training, and a data usage checker that ensures all available data sources, not just basic CSV files, are utilized.

Google tested MLE-STAR on MLE-Bench-Lite, a benchmark suite derived from actual Kaggle competitions. The results demonstrated a significant leap in performance, with the agent achieving a medal in 63.6 percent of cases, a substantial increase from the previous best of 25.8 percent. Notably, 36 percent of these were gold medals. Google attributes this success to MLE-STAR's ability to incorporate modern model architectures like EfficientNet and ViT, contrasting with competing systems that often favor older designs such as ResNet. The system also supports manual adjustments, demonstrated by the successful integration of the RealMLP model after a manual description was provided.

The development team observed instances where large language models like Gemini 2.5 Flash and Pro generated flawed code, such as using test data for normalization. MLE-STAR's integrated data leak checker effectively intervened in these situations. Similarly, the data usage checker identified and included datasets that were initially overlooked during testing.

MLE-STAR is now available as open-source, built upon Google's Agent Development Kit. Users are responsible for ensuring proper licensing for any models or web search content they utilize. Currently, MLE-STAR is intended for research purposes only.

MLE-STAR: Google's AI Agent Automates ML Pipelines with Minimal Input

Related Articles

ByteDance's Seed-Prover Achieves SOTA in Automated Math Proving

Google's Gemini 2.5 Deep Think AI Model Wins Math Olympiad Gold

Google AI Ultra Subscribers Get Gemini 2.5 Deep Think for Complex Problem Solving

Related Articles

▸
ByteDance's Seed-Prover Achieves SOTA in Automated Math Proving

▸
Google's Gemini 2.5 Deep Think AI Model Wins Math Olympiad Gold

▸
Google AI Ultra Subscribers Get Gemini 2.5 Deep Think for Complex Problem Solving