GPT-5: Leaks suggest modest upgrade, not a breakthrough for OpenAI

2025-08-02T08:58:20.000ZDecoder

OpenAI is preparing to launch GPT-5, its next flagship large language model, but expectations point to a modest upgrade rather than a revolutionary leap. Following GPT-4, released in March 2023, internal testing of GPT-5 indicates progress in areas such as programming, mathematics, and executing complex instructions, including automating customer service workflows. However, the anticipated jump in capability is considerably smaller than the substantial leap from GPT-3 (2020) to GPT-4 (2023).

Sources familiar with the evaluations indicate GPT-5 will facilitate more user-friendly applications and demonstrate improved management of its computational resources. Despite these advancements, the overall improvements are described as incremental.

This observed plateau aligns with predictions made by prominent AI figures. Microsoft co-founder Bill Gates predicted this in late 2023, and LLM critics such as Gary Marcus, former OpenAI chief scientist Ilya Sutskever, and Meta's Yann LeCun have repeatedly contended that the Transformer-based architecture, which underpins most current large language models, is reaching its limits.

A telling example of these challenges is OpenAI's internal "Orion" model. Initially developed as a direct successor to GPT-4o, Orion failed to deliver the anticipated gains. It was subsequently released as GPT-4.5 in early 2025, rather than earning the GPT-5 designation. GPT-4.5 made little impact, reportedly ran slower and cost more than GPT-4o, and quickly faded from prominence. A core issue, according to The Information, was that pretraining modifications that worked for smaller models did not scale effectively to larger ones. Concurrently, OpenAI faced a dwindling supply of high-quality web data for training. As recently as June 2025, none of OpenAI's models in development were considered strong enough to be called GPT-5.

This challenge isn't exclusive to OpenAI. Anthropic's recent Claude 4 models also delivered only modest overall improvements, aside from a notable boost in coding performance. Anthropic already utilizes a hybrid architecture combining a large language model with specialized reasoning components, an much like OpenAI may also adopt for GPT-5.

Beyond its main generative models, OpenAI has been developing "large reasoning models" (LRMs). These models tend to perform better on complex tasks when allocated more computational power and could become valuable tools for math, web search, and programming—or even point to entirely new directions for language models. However, open questions remain about their generalizability and energy requirements.

A significant breakthrough in this area for OpenAI was the Q* model in late 2023, which reportedly solved math problems it hadn't encountered before. Building on this, OpenAI developed the o1 and o3 models, both based on GPT-4o and designed for specialized applications. Both o1 and o3 were trained using reinforcement learning (RL), with the o3 "teacher model" receiving significantly more compute and direct access to web and code sources. During RL training, the model generates answers to expert-level questions and improves itself by comparing its responses to human solutions.

However, when these models were adapted for chat, o3 reportedly lost some of its capability. As one source told The Information, the chat version had to be "dumbed down" because it wasn't trained enough for real conversation, which hurt performance in both chat and API settings. This issue was highlighted by the ARC-AGI benchmark in April, where the public o3 version performed worse on a tough puzzle test than the internal base model, showing that many original reasoning abilities didn't survive the transition to chat.

The o3-pro model further illustrates this delicate balance. While experts rated o3-pro highly for science, programming, and business tasks, it struggled with simple daily tasks. For instance, replying to "Hi, I'm Sam Altman" took several minutes and racked up $80 in compute costs for a trivial answer—a textbook case of overthinking. GPT-5 aims to strike a balance between specialized reasoning power and practical conversational utility.

Despite these technical hurdles, GPT-5 is intended to drive progress in "agentic" systems—applications where an AI can carry out multiple steps on its own. The new model should be able to follow complex instructions more efficiently, with less human oversight. GPT-5 is also projected to surpass GPT-4o in capability without using much more compute. Internal tests show it's better at gauging which tasks need more or less computing power, which could make processes more efficient and help avoid the kind of overthinking seen in models like o3-pro.

For OpenAI, even modest improvements in GPT-5 could be sufficient to keep customers and investors engaged. The company is still growing fast, despite high operating costs. In the competitive field of coding-related AI, where Anthropic currently leads with its Claude models, OpenAI is hoping to regain ground with GPT-5.

OpenAI is increasingly leveraging reinforcement learning, especially a "universal verifier" that automatically rates the quality of model responses—even for subjective tasks like creative writing. This universal verifier was also used in the OpenAI model that recently won gold at the International Mathematical Olympiad. OpenAI researcher Jerry Tworek has suggested that this RL system could form the basis for general artificial intelligence (AGI).

GPT-5: Leaks suggest modest upgrade, not a breakthrough for OpenAI - OmegaNext AI News