GPT-5 Underwhelms: AI Shifts from Pure Research to Application Focus
Sam Altman, CEO of OpenAI, set exceptionally high expectations for GPT-5 prior to its release last Thursday, stating its capabilities made him feel “useless relative to the AI” and evoking parallels to the developers of the atom bomb. This new offering was positioned not merely as an incremental upgrade, but as a pivotal step towards artificial general intelligence (AGI)—the long-promised frontier of AI that evangelists believe will fundamentally transform humanity for the better. Yet, against this backdrop of immense anticipation, GPT-5 has largely underwhelmed.
Early testers and critics have quickly highlighted glaring errors in GPT-5’s responses, directly contradicting Altman’s launch-day assertion that the model operates like “a legitimate PhD-level expert in anything any area you need on demand.” Issues have also emerged with OpenAI’s promise that GPT-5 would automatically discern the optimal AI model for a given query—whether a complex reasoning model or a faster, simpler one. Altman himself appears to have conceded this feature is flawed and compromises user control. On a more positive note, the new model reportedly addresses the previous iteration’s tendency to excessively flatter users, with GPT-5 showing less inclination to shower them with effusive compliments. Overall, as some observers have noted, the release feels more like a polished product update—offering slicker, more aesthetically pleasing conversational interfaces—rather than a groundbreaking leap in AI capabilities.
This seemingly modest advance in raw intelligence reflects a broader shift within the AI industry. For a period, AI companies focused primarily on building the smartest possible models, akin to a universal “brain,” trusting that general intelligence would naturally translate into diverse applications from poetry to organic chemistry. The strategy revolved around scaling models, refining training techniques, and pursuing fundamental technical breakthroughs. However, this approach appears to be evolving. With anticipated breakthroughs perhaps not materializing as quickly as hoped, the current playbook involves aggressively marketing existing models for specific applications, often with ambitious claims. For instance, companies have increasingly asserted their AI models can replace human coders, despite early evidence suggesting otherwise. This pivot implies that, for the foreseeable future, large language models may only see marginal improvements in core capabilities, compelling AI companies to maximize the utility of their current offerings.
Nowhere is this strategic shift more evident than in OpenAI’s explicit encouragement for users to leverage GPT-5 for health advice—a particularly fraught and sensitive domain. Initially, OpenAI largely steered clear of medical queries, with ChatGPT often providing extensive disclaimers about its lack of medical expertise, and sometimes refusing to answer health-related questions altogether. However, reports indicate these disclaimers began to vanish with subsequent model releases. OpenAI’s models can now interpret X-rays and mammograms, and even pose follow-up questions designed to guide users toward a diagnosis.
This deliberate push into healthcare solidified in May with the announcement of HealthBench, a system designed to evaluate AI’s proficiency in health topics against the opinions of medical professionals. This was followed by a July study, co-authored by OpenAI, which reported that a group of Kenyan doctors made fewer diagnostic errors when assisted by an AI model. The launch of GPT-5 further cemented this trajectory, with Altman featuring an OpenAI employee, Felipe Millon, and his wife, Carolina Millon, who had recently been diagnosed with multiple forms of cancer. Carolina shared her experience using ChatGPT to translate complex medical jargon from biopsy results and to aid decisions, such as whether to pursue radiation therapy. The trio presented this as an empowering example of bridging the knowledge gap between patients and physicians.
Yet, this change in approach plunges OpenAI into dangerous territory. The company appears to be extrapolating from evidence that AI can serve as a beneficial clinical tool for trained doctors to suggest that individuals without medical backgrounds should seek personal health advice directly from AI models. A significant concern is that many users might follow such advice without ever consulting a physician, especially now that the chatbot rarely prompts them to do so. A stark illustration of this risk emerged just two days before GPT-5’s launch, when the Annals of Internal Medicine published a case study detailing a man who developed severe bromide poisoning—a condition largely eradicated in the US since the 1970s—after he stopped consuming salt and ingested dangerous amounts of bromide following a conversation with ChatGPT. He nearly died, spending weeks hospitalized.
At its core, this situation raises critical questions of accountability. When AI companies transition from promising abstract general intelligence to offering human-like helpfulness in specialized fields like healthcare, the issue of liability for mistakes becomes paramount and largely unresolved. As Damien Williams, an assistant professor of data science and philosophy at the University of North Carolina Charlotte, points out, “When doctors give you harmful medical advice due to error or prejudicial bias, you can sue them for malpractice and get recompense.” He contrasts this sharply with AI: “When ChatGPT gives you harmful medical advice because it’s been trained on prejudicial data, or because ‘hallucinations’ are inherent in the operations of the system, what’s your recourse?” The current landscape offers little indication that tech companies will be held liable for the harm their AI models might cause.