GPT-5's Graphic Hallucinations: Maps & Timelines Flawed
OpenAI’s recently unveiled GPT-5, touted as the company’s flagship large language model, promises enhanced reasoning capabilities and more accurate responses than its predecessors. However, initial hands-on testing suggests that while the model excels in many areas, it still grapples significantly with the accurate rendering of text within graphics, often producing information from what appears to be an alternate reality.
Following social media reports of GPT-5 “hallucinating”—generating factually incorrect or nonsensical information—in infographics, our tests began with a simple request: “generate a map of the USA with each state named.” The resulting image, while correctly depicting state sizes and shapes, was riddled with misspellings and fabricated names. Oregon became “Onegon,” Oklahoma transformed into “Gelahbrin,” and Minnesota was labeled “Ternia.” Strikingly, only Montana and Kansas were correctly identified, with some letters in other state names barely legible.
To ascertain if this was a US-specific anomaly, we then asked for a “map of South America” with all countries named. While GPT-5 showed slight improvement, correctly identifying major nations like Argentina, Brazil, Bolivia, Colombia, and Peru, errors persisted. Ecuador appeared as “Felizio,” Suriname as “Guriname,” and Uruguay as “Urigim.” Adding to the confusion, the name for Chile was bizarrely superimposed over southern Argentina.
The challenges extended beyond geography. When prompted to “draw a timeline of the US presidency with the names of all presidents,” GPT-5 delivered its least accurate graphic yet. The timeline listed only 26 presidents, years were illogical and mismatched to individuals, and a host of names were entirely invented. For instance, the fourth president was identified as “Willian H. Brusen,” supposedly residing in the White House in 1991. Other fictional leaders included Henbert Bowen in 1934 and Benlohin Barrison in 1879, with even Thomas Jefferson’s name misspelled.
Curiously, a stark contrast emerged when the model was asked to “make an infographic showing all the actors who played James Bond in order.” After an initial text-only output, a follow-up prompt to include an image yielded a remarkably accurate timeline, omitting only Sean Connery’s role in “Diamonds Are Forever.” This unexpected success highlights a peculiar inconsistency.
It’s important to note that GPT-5 is perfectly capable of providing accurate textual information for the very queries it fails to illustrate correctly. When asked for simple lists of US states, South American countries, or US presidents, the model delivered precise answers. The only minor textual inaccuracy observed was Joe Biden’s tenure listed as “2021-present,” suggesting the model’s training data might not encompass the most recent political developments. OpenAI has yet to disclose specific training dates for this model.
The precise reasons behind GPT-5’s struggle with embedded text in images remain unconfirmed by OpenAI. However, industry experts theorize that image generation, often relying on “diffusion” processes where models learn by reconstructing images from noise, inherently finds rendering accurate text challenging. Historically, text generated by diffusion models often resembled indecipherable hieroglyphics rather than coherent language. This difficulty is not unique to OpenAI. Bing Image Creator, for example, produced similarly flawed US maps, even misspelling the country as “United States Ameriicca,” and struggled with the James Bond timeline.
Other leading AI models exhibit their own quirks. Anthropic’s Claude, when asked for a US map, accurately named states but generated an SVG code file rather than a traditional image, resulting in a list-like output within boxes. Interestingly, when GPT-5 was directed to use its “canvas” feature for code-based map generation, it produced an accurate result, suggesting the issue lies specifically with its image generation pipeline, not its ability to process factual data or generate code. Google’s Gemini, while performing worse than GPT-5 on the US map (producing zero correct state names), created an exceptionally detailed James Bond infographic, even including numerous recurring cast members.
Ultimately, the challenge of accurately embedding text within generated images appears to be a widespread hurdle for current large language models and image generators. While these advanced AIs can easily recall and present factual information in text format, translating that knowledge into visually accurate labels within a graphic remains a significant, and often comically flawed, undertaking—unless, it seems, the subject is James Bond.