GPT-5 Underwhelms: Is AI Innovation Hitting a Wall?

Ft

OpenAI’s highly anticipated launch of its new artificial intelligence model, GPT-5, last week was poised to be a pivotal moment. CEO Sam Altman hailed it as “a significant step along the path to AGI,” or Artificial General Intelligence, referring to AI systems that could achieve or surpass human-level intelligence. Executives also hoped the model would refine the user experience of ChatGPT, the versatile chatbot that has become the fastest-growing consumer application in history.

However, the promised positive “vibes” quickly dissipated. Users flooded social media with images illustrating basic errors, such as mislabeling a US map—a recurring issue from previous models. More critically, advanced users expressed disappointment with a perceived shift in the model’s “personality” and noted its underwhelming performance in standard benchmarks compared to rivals. Despite the immense build-up, GPT-5 is now widely viewed as merely an incremental upgrade rather than the revolutionary leap seen in earlier GPT iterations. Thomas Wolf, co-founder of Hugging Face, summarized: “people expected to discover something totally new. And here we didn’t really have that.”

With hundreds of billions of dollars poured into generative AI, a pressing question now echoes through Silicon Valley: what if this is as good as it gets? For three years, AI researchers and investors have grown accustomed to a relentless pace of innovation. OpenAI, once seemingly unassailable, has seen competitors like Google, Anthropic, DeepSeek, and Elon Musk’s xAI rapidly close the gap. This fueled bold predictions of imminent AGI, with Altman forecasting its arrival during Donald Trump’s presidency. These soaring expectations, which underpin OpenAI’s projected $500 billion valuation, collided with reality when GPT-5 failed to impress. Gary Marcus, a prominent AI critic, starkly put it: “GPT-5 was this central icon of the entire approach of scaling to get to AGI, and it didn’t work.”

Stuart Russell, a computer science professor at UC Berkeley, draws parallels to the “AI winter” of the 1980s, when innovations failed to meet expectations and deliver returns. He recalls that the “bubble burst” then, as systems weren’t making money. Russell cautions that a similar scenario could unfold rapidly today, likening it to a game of musical chairs where everyone scrambles to avoid being left holding the “AI baby.” While some contend the technology remains in its nascent stages and capital continues to flow, Russell warns that overinflated expectations can backfire dramatically.

A core challenge stems from the prevailing method of building large language models: more data and computing power yield larger, more capable models. While many AI leaders believe these “scaling laws” will hold, this approach is encountering resource limitations. AI companies have largely exhausted freely available training data, now actively pursuing content-sharing agreements with publishers. Additionally, training and running large models is incredibly energy-intensive. GPT-4 used thousands of Nvidia chips; GPT-5 reportedly used hundreds of thousands of next-generation processors. OpenAI CEO Sam Altman recently acknowledged these limits, stating that while underlying AI models are “still getting better at a rapid rate,” chatbots like ChatGPT are “not going to get much better.”

Some AI researchers argue that the intense focus on scaling large language models has inadvertently limited progress by overshadowing alternative research. Yann LeCun, Meta’s chief scientist, believes “We are entering a phase of diminishing return with pure LLMs trained with text.” However, he stresses this does not signify a ceiling for “deep-learning-based AI systems trained to understand the real world through video and other modalities.” These “world models” learn from physical world elements beyond language, enabling planning, reasoning, and persistent memory, potentially driving advancements in self-driving cars or robotics. Joelle Pineau, chief AI officer at Cohere, concurs: “Simply continuing to add compute and targeting theoretical AGI won’t be enough.”

Suspicions about a slowdown in AI development are already influencing US trade and technology policy. Under President Joe Biden’s administration, the emphasis was firmly on safety and regulation, driven by concerns that AI’s rapid growth posed dangerous consequences. Donald Trump’s libertarian leanings always suggested AI regulation would be less of a priority, but even recently, national security concerns led Washington to threaten export controls on Nvidia’s H20 chips for China. A clear signal of Washington’s changing perspective came from David Sacks, Trump’s AI tsar, who declared that “Apocalyptic predictions of job loss are as overhyped as AGI itself.” Sacks posited that the AI market had achieved a “Goldilocks” state of balance, with close competition and a clear human role. Soon after, Trump struck a deal with Nvidia CEO Jensen Huang to resume H20 sales to China and even considered allowing modified versions of more powerful Blackwell systems. Analysts suggest that with AGI no longer viewed as an imminent risk, Washington’s focus has shifted towards ensuring US-made AI chips and models dominate globally.

While perhaps not OpenAI’s intention, the launch of GPT-5 underscores a fundamental shift: AI companies are “slowly coming to terms with the fact that they are building infrastructure for products,” according to Sayash Kapoor, a researcher at Princeton University. His team found GPT-5’s performance consistently mid-tier but excelled in being “quite cost effective and also much quicker than other models.” This efficiency could unlock significant innovation in products and services, even without extraordinary breakthroughs towards AGI. Miles Brundage, an AI policy researcher, observes, “It makes sense that as AI gets applied in a lot of useful ways, people would focus more on the applications versus more abstract ideas like AGI.” Leading AI companies are now deploying “forward-deployed engineers” to embed models directly into client systems. Kapoor points out: “Companies wouldn’t do that if they thought they were close to automating all of human work for the rest of time.”

Despite the apparent slowdown in foundational