Google's Gemini Upgraded with Deep Think, Raises Safety Concerns
Google has unveiled “Deep Think,” a significant upgrade to its Gemini AI model, designed to tackle complex problems by allowing the artificial intelligence more “thinking time.” The new feature is now accessible to Google AI Ultra subscribers within the Gemini application. Google states that this release, which incorporates both tester feedback and recent research, represents a clear advancement over the version showcased at I/O earlier this year.
Deep Think can be activated within the app, though its usage is subject to a daily request limit. It is engineered to automatically leverage tools such as code execution and Google Search, enabling it to produce considerably longer and more detailed responses than previous iterations.
The core of Deep Think’s enhanced capability lies in what Google describes as “parallel thinking” techniques. This approach aims to emulate how humans approach difficult problems: by simultaneously generating, evaluating, and combining multiple ideas to arrive at the optimal solution. To facilitate this, the model is allocated additional “inference time” – essentially, more processing time – before it delivers its response. While similar experimental methods like Self Consistency and Tree-of-Thought have existed, Deep Think integrates new reinforcement learning techniques to ensure these expanded reasoning paths are used productively, aiming to improve its problem-solving prowess over time. The underlying Gemini 2.5 model utilizes a Sparse Mixture-of-Experts (MoE) architecture and supports a substantial context window of up to one million tokens for input and 192,000 tokens for output.
Google highlights Deep Think’s particular strength in tasks demanding creativity and strategic planning. This includes iteratively improving web design, supporting advanced scientific and mathematical research, and resolving intricate programming challenges. In benchmark tests, Gemini 2.5 Deep Think has demonstrated robust performance, scoring 87.6% on LiveCodeBench V6 for code generation and 34.8% on Humanity’s Last Exam for knowledge and logical reasoning. These results reportedly surpass rivals like OpenAI o3 and Grok 4 in scenarios where external tools are not utilized.
Notably, this public release is a modified version of the AI model that achieved a gold medal at the International Mathematical Olympiad (IMO). While the IMO-winning variant required hours to solve its problems, the public version is optimized for speed and everyday use, still managing to achieve bronze-medal performance on the 2025 IMO benchmark. The full, gold-level model remains exclusively available to a select group of mathematicians and researchers.
However, this leap in capability also brings new safety considerations, as acknowledged by Google. A comprehensive safety review, conducted under the “Frontier Safety Framework” (FSF) due to “exceptional differences” from earlier models, revealed that Deep Think has crossed a critical threshold in certain risk areas. Specifically, within Chemical, Biological, Radiological, and Nuclear (CBRN) domains, the model has reached the “early warning alert threshold” for “Uplift Level 1.” This indicates that the AI could potentially provide sufficient technical knowledge to significantly assist low-resourced individuals or groups in developing weapons of mass destruction. Google is continuing to evaluate these risks and has already implemented precautionary measures.
Deep Think also meets the same early warning threshold for cybersecurity that was previously identified with Gemini 2.5 Pro. While its performance in cybersecurity tasks has improved, it continues to face challenges with the most demanding real-world scenarios.
In response to these findings, Google states it has implemented multiple layers of safeguards. These measures include filtering dangerous outputs, multi-level monitoring, blocking abusive accounts, and ongoing “red-teaming” exercises to rigorously test its protection systems.