GPT-5's AGI Claims Questioned: Has AI Plateaued?

Theconversation

OpenAI’s latest flagship model, GPT-5, is being heralded by the company as a “significant step” toward Artificial General Intelligence (AGI), a hypothetical state where AI systems autonomously outperform humans in most economically valuable tasks. Yet, despite these grand claims, OpenAI CEO Sam Altman’s descriptions of GPT-5’s advancements are remarkably understated. He highlights improvements in coding ability, a reduction in “hallucinations” (where the AI generates false information), and better adherence to multi-step instructions, particularly when integrating with other software. The model is also reportedly safer and less “sycophantic,” designed not to deceive or provide harmful information simply to please the user.

Altman suggests that interacting with GPT-5 feels akin to conversing with a PhD-level expert on any given topic. However, this assertion is immediately tempered by the model’s fundamental inability to ascertain the accuracy of its own output. For instance, it struggles with basic tasks like accurately drawing a map of North America. Furthermore, GPT-5 cannot learn from its own experiences and achieved only 42% accuracy on “Humanity’s Last Exam,” a challenging benchmark covering diverse scientific and academic subjects. This performance slightly trails Grok 4, a competing model from Elon Musk’s xAI, which reportedly reached 44%.

The primary technical innovation underpinning GPT-5 appears to be the introduction of a “router.” This component intelligently decides which internal GPT model to engage when responding to a query, essentially determining the computational effort required for an answer and refining this process based on feedback from previous choices. The router can delegate to prior leading GPT models or a new, dedicated “deeper reasoning” model called GPT-5 Thinking. The precise nature of this new model remains unclear, as OpenAI has not indicated it relies on novel algorithms or new datasets, given that most available data has already been extensively utilized. This leads to speculation that GPT-5 Thinking might simply be an elaborate mechanism for prompting existing models multiple times, pushing them harder to yield better results.

The foundation of today’s powerful AI systems lies in Large Language Models (LLMs), a type of AI architecture developed by Google researchers in 2017. These models excel at identifying complex patterns within vast sequences of words, which form the bedrock of human language. By training on immense quantities of text, LLMs learn to predict the most probable continuation of a given word sequence, allowing them to generate coherent and contextually relevant responses to user prompts. This approach, exemplified by systems like ChatGPT, has steadily improved as LLMs are exposed to ever-larger datasets. Fundamentally, these models operate like an intricate lookup table, mapping a user’s stimulus (prompt) to the most appropriate response. It is remarkable that such a seemingly simple concept has enabled LLMs to surpass many other AI systems in terms of flexibility and usability, if not always in absolute accuracy or reliability.

Despite their impressive capabilities, the jury is still out on whether LLMs can truly achieve AGI. Critics question their capacity for genuine reasoning, their ability to comprehend the world in a human-like manner, or their skill in learning from experience to refine their own behavior—all widely considered essential ingredients for AGI. In the interim, a thriving industry of AI software companies has emerged, dedicated to “taming” general-purpose LLMs to be more reliable and predictable for specific applications. These companies often employ sophisticated prompt engineering techniques, sometimes querying models multiple times or even utilizing numerous LLMs simultaneously, adjusting instructions until the desired outcome is achieved. In some cases, they “fine-tune” LLMs with specialized add-ons to enhance their effectiveness.

OpenAI’s new router, built directly into GPT-5, aligns with this industry trend. If successful, this internal optimization could reduce the need for external AI engineers further down the supply chain and potentially make GPT-5 more cost-effective for users by delivering better results without additional embellishments. However, this strategic move could also be seen as an implicit admission that LLMs are approaching a plateau in their ability to fulfill the promise of AGI. If true, it would validate the arguments of scientists and industry experts who have long contended that current AI limitations cannot be overcome without moving beyond existing LLM architectures.

The emphasis on routing also echoes a concept known as “meta reasoning,” which gained prominence in AI during the 1990s. This paradigm centered on the idea of “reasoning about reasoning”—for example, deciding how much computational effort is worth investing to optimize a complex task. This approach, focused on breaking down problems into smaller, specialized components, was dominant before the shift towards general-purpose LLMs.

The release of GPT-5, with its focus on internal delegation rather than groundbreaking new algorithms, may mark a significant pivot in AI evolution. While it might not signal a full return to older paradigms, it could usher in an era where the relentless pursuit of ever more complex and inscrutable models gives way to a focus on creating AI systems that are more controllable through rigorous engineering methods. Ultimately, this shift might serve as a powerful reminder that the original vision for artificial intelligence was not merely to replicate human intelligence, but also to deepen our understanding of it.