Google Launches Gemini Deep Think AI for Advanced Parallel Reasoning
Google DeepMind has commenced the rollout of Gemini 2.5 Deep Think, an advanced AI reasoning model designed to enhance problem-solving by exploring and evaluating multiple ideas concurrently. This new capability, which then selects the optimal answer from these explorations, is now accessible to subscribers of Google’s $250-per-month Ultra subscription within the Gemini app, effective this Friday.
First introduced at Google I/O 2025 in May, Gemini 2.5 Deep Think marks Google’s inaugural publicly available multi-agent model. These systems operate by deploying multiple AI agents that tackle a single question in parallel. While this method demands significantly more computational resources than a single-agent approach, it typically yields more accurate and comprehensive answers.
A variant of Gemini 2.5 Deep Think notably secured a gold medal at this year’s International Math Olympiad (IMO). Alongside the public release of Gemini 2.5 Deep Think, Google is making the specific IMO-winning model available to a select group of mathematicians and academics. The company notes that this specialized AI model requires hours, rather than seconds or minutes, to complete its reasoning processes, unlike most consumer-facing AI. Google hopes this will foster research and provide feedback for refining multi-agent systems for academic applications.
Google asserts that Gemini 2.5 Deep Think represents a substantial improvement over the version previewed at I/O. The company also highlights the development of “novel reinforcement learning techniques” to optimize the model’s utilization of its reasoning pathways. In a blog post, Google stated that “Deep Think can help people tackle problems that require creativity, strategic planning and making improvements step-by-step.”
On the Humanity’s Last Exam (HLE), a rigorous test assessing AI’s proficiency across thousands of crowdsourced questions in math, humanities, and science, Gemini 2.5 Deep Think achieved a score of 34.8% without the aid of external tools. This performance surpasses xAI’s Grok 4, which scored 25.4%, and OpenAI’s o3, at 20.3%. Furthermore, Google’s model outperformed competitors on LiveCodeBench 6, a challenging benchmark for competitive coding tasks, scoring 87.6% compared to Grok 4’s 79% and OpenAI’s o3’s 72%.
Gemini 2.5 Deep Think integrates seamlessly with tools such as code execution and Google Search, and is capable of generating “much longer responses” than conventional AI models. Google’s internal tests indicate that the model produces more detailed and aesthetically refined results for web development tasks compared to other AI systems, potentially aiding researchers and accelerating discovery processes.
The adoption of multi-agent systems appears to be a growing trend among leading AI laboratories. Elon Musk’s xAI recently launched Grok 4 Heavy, its own multi-agent system, which claims industry-leading performance across several benchmarks. Similarly, OpenAI’s unreleased AI model, which also earned a gold medal at this year’s International Math Olympiad, is reportedly a multi-agent system. Anthropic’s Research agent, known for generating comprehensive research briefs, is also powered by a multi-agent architecture.
Despite their strong performance capabilities, multi-agent systems are considerably more computationally intensive and, consequently, more expensive to operate than traditional AI models. This economic reality suggests that tech companies may continue to reserve these advanced systems for their premium subscription tiers, a strategy now employed by both xAI and Google.
In the coming weeks, Google plans to extend access to Gemini 2.5 Deep Think to a select group of testers via the Gemini API, aiming to gain insights into how developers and enterprises might leverage its multi-agent system.