ChatGPT-5's 'PhD-level' AI struggles with basic spelling & geography
OpenAI has launched GPT-5, the latest iteration of its flagship ChatGPT model, making it accessible to the platform’s 700 million weekly users. Touted by CEO Sam Altman as akin to having a “PhD-level expert in your pocket,” a significant leap from its “college student” predecessor, early user experiences with the new AI have, however, painted a curious picture, revealing unexpected struggles with fundamental tasks like spelling and geography.
Despite the ambitious claims, social media users quickly discovered the advanced chatbot making basic factual errors. One user on Bluesky, for instance, found GPT-5 repeatedly insisting that the word “blueberry” contained three ‘B’s. The AI confidently responded, “Yep – blueberry is one of those words where the middle almost trips you up, like it’s saying ‘b-b-better pay attention,’” adding, “That little bb moment is satisfying, though – it makes the word feel extra bouncy.”
The inaccuracies extended beyond simple spelling. Another user reported the chatbot’s inability to correctly identify US states containing the letter ‘R’. When prompted to generate a map, GPT-5 not only misspelled states such as “Krizona” and “Vermoni” but also double-listed California and fabricated non-existent states like “New Jefst” and “Mitroinia.” Guardian Australia’s own test with the model further highlighted these limitations: while GPT-5 could identify Australian states and territories containing an ‘R’, it incorrectly stated that the Northern Territory had only three ‘R’s instead of five, and rendered it as “Northan Territor” on a map.
OpenAI, when contacted for comment, had previously stated upon the product’s launch that GPT-5 would exhibit fewer errors and instances of “AI hallucinations”—the phenomenon where AI confidently fabricates information. The company attributes the potential source of these issues to GPT-5’s complex architecture, which employs a “real-time router” designed to select the most appropriate internal AI model for a given conversation based on its type and intent. OpenAI has suggested that users can prompt the AI to engage its most advanced reasoning model by instructing it to “think hard about this.” The company asserts that this routing system is continuously refined through user feedback, including model switches and response preference rates.
However, observations from industry experts suggest the problem might run deeper. Dan Shipper, CEO of the media and AI startup Every, noted that GPT-5 sometimes hallucinates even on questions that should logically trigger its reasoning model. Shipper recounted an instance where, after taking a picture of a novel passage and asking for an explanation, GPT-5 would “confidently make things up.” Yet, he found that explicitly asking the AI to “think longer” often yielded an accurate response.
While OpenAI CEO Sam Altman acknowledged that the AI had not yet achieved artificial general intelligence (AGI)—a level of human-like cognitive ability—he described GPT-5 as “generally intelligent” and a “significant step on the path to AGI.” The current user experiences, however, underscore the persistent challenge of bridging the gap between sophisticated AI capabilities and the foundational accuracy expected even from a basic language model.