OpenAI Unveils GPT-5: Unified AI with Adaptive Reasoning for Complex Tasks

Decoder

OpenAI has officially launched GPT-5, introducing what it describes as a unified AI system designed for adaptive reasoning across complex tasks. Building on the foundational advancements of its predecessors, this new architecture consolidates previous model lines, enabling the system to dynamically adjust its computational “thinking effort” based on the complexity of any given query, a design choice aimed at delivering more reliable and accurate responses.

Access to GPT-5 will be tiered, marking a significant shift for free users who, for the first time, will be able to experiment with a model specifically engineered for logical reasoning. Paying subscribers, conversely, will benefit from higher usage limits and a suite of exclusive features. The core of GPT-5 isn’t a singular monolithic model but rather an integrated system. It leverages gpt-5-main, a fast and efficient model for the majority of routine queries, while a more profound reasoning model, gpt-5-thinking, is invoked for intricate problems. A sophisticated real-time router, continuously refined through user feedback, intelligently selects the appropriate model based on factors like question difficulty, conversational context, or even explicit user directives such as “think carefully about this.” For “Pro” subscribers, OpenAI offers GPT-5 Pro, a variant that dedicates even more processing time to reasoning through challenging questions, with external evaluators reportedly preferring it over gpt-5-thinking in nearly 68 percent of difficult scenarios.

OpenAI claims GPT-5 establishes new benchmarks across diverse domains including programming, healthcare, and writing. In coding, the model is touted for its exceptional ability to construct complex front-end interfaces and debug extensive codebases, achieving a 74.9 percent score on SWE-bench Verified and 88 percent on Aider Polyglot, significantly reducing error rates by two-thirds compared to earlier iterations. For health-related inquiries, GPT-5 aims to provide more precise answers, functioning as an “active thought partner” capable of asking follow-up questions. It scored 46.2 percent on the demanding HealthBench Hard test, an increase from 31.6 percent by its predecessor, though OpenAI emphasizes it is not a substitute for medical professionals. Further performance gains are evident in other benchmarks, with GPT-5 scoring 94.6 percent on AIME 2025 (math, no tools) and 84.2 percent on MMMU (multimodal understanding). The premium GPT-5 Pro reportedly achieved an 88.4 percent score on the GPQA benchmark for highly difficult science questions.

A key promise of GPT-5 lies in its substantial reduction of “hallucinations”—the generation of factually incorrect or nonsensical information. With web search capabilities activated, OpenAI states the model is approximately 45 percent less prone to factual errors than GPT-4o. In its pure “thinking” mode, the error rate drops by an impressive 80 percent compared to its predecessor. On open, fact-based benchmarks such as LongFact and FActScore, GPT-5 produced roughly six times fewer hallucinations. Even without up-to-date web data, GPT-5’s “thinking” mode averages hallucination rates between 0.8 and 1.4 percent on LongFact-Concepts, LongFact-Objects, and FActScore, a dramatic improvement from the 24 to 38 percent seen in earlier models, translating to more than five times fewer factual mistakes. The model is also engineered for greater transparency regarding its own limitations. In one test involving questions about non-existent images on the CharXiv benchmark, GPT-5 provided confident, made-up answers only 9 percent of the time, a sharp contrast to its predecessor’s 86.7 percent. Overall, the deception rate in representative conversations reportedly decreased from 4.8 percent to 2.1 percent with GPT-5.

GPT-5 introduces “Safe Completions,” a novel safety paradigm detailed in an accompanying research paper. This system replaces the previous “hard refusal” method, which OpenAI deemed too inflexible, particularly for ambiguous or dual-use topics where information could be applied for both beneficial and harmful purposes. Instead of outright blocking requests, GPT-5 prioritizes making the output safe, rather than solely judging user intent. The model endeavors to provide the most helpful response possible within predefined safety guidelines, which might involve offering a high-level overview, a partial answer, or an alternative perspective. Human evaluators reportedly found this approach to be safer, more helpful, and better balanced. Consistent with this, GPT-5-thinking has been rated “high capability” for biology and chemistry under OpenAI’s Preparedness Framework, following over 5,000 hours of rigorous red teaming conducted by partners like CAISI (US) and UK AISI.

Beyond its core capabilities, GPT-5 brings several new features to its API, offering developers enhanced control over the model’s reasoning effort and verbosity. “Custom Tools” can now be invoked using plain text rather than strict JSON, which is expected to minimize errors for complex inputs. The context window has been significantly expanded to accommodate 272,000 input tokens and 128,000 output tokens. The API now offers three distinct model sizes: gpt-5, gpt-5-mini, and gpt-5-nano, with gpt-5 designated as the most powerful “thinking” variant, priced at $1.25 per million input tokens and $10 per million output tokens.

The ChatGPT user interface is also receiving updates. The new model is designed to be considerably less “sycophantic,” with this behavior reportedly dropping from 14.5 percent to under 6 percent in tests. Users will gain the ability to customize the visual appearance of their chats and, as a research preview, select from four preset personalities such as “Cynic” or “Nerd.” GPT-5’s rollout commences immediately, becoming the new default model for Team, Enterprise, and Education customers, while Plus subscribers receive higher usage limits and Pro users gain unlimited access to GPT-5 and exclusive access to GPT-5 Pro.