Vision AI Models See Illusions Where None Exist

Advanced artificial intelligence models with visual capabilities are exhibiting a peculiar form of self-delusion: they perceive optical illusions in images where none actually exist. This phenomenon, dubbed “illusion-illusions” by researchers, highlights a significant disconnect in how these systems interpret visual information and relate it to their vast linguistic understanding.

A recent experiment, replicating a concept by Tomer Ullman, an associate professor in Harvard’s Department of Psychology, vividly demonstrated this issue. When presented with a straightforward image of a duck – not the famous duck-rabbit optical illusion – the current version of ChatGPT, powered by GPT-5, confidently misidentified it. The AI model responded by stating, “It’s the famous duck-rabbit illusion, often used in psychology and philosophy to illustrate perception and ambiguous figures.” Despite the image containing only a duck, ChatGPT even offered to highlight both “interpretations,” producing a distorted, chimeric output.

Ullman detailed this behavior in his recent preprint paper, “The Illusion-Illusion: Vision Language Models See Illusions Where There are None.” He explains that optical illusions are invaluable diagnostic tools in cognitive science, philosophy, and neuroscience because they reveal the inherent gap between objective reality and subjective perception. Similarly, they can offer crucial insights into the workings of artificial intelligence systems. Ullman’s research specifically investigates whether contemporary vision language models mistakenly identify certain images as optical illusions, even when humans would easily perceive them without ambiguity.

His paper outlines numerous instances of these “illusion-illusions,” where AI models detect something resembling a known optical illusion, yet the image creates no visual uncertainty for human observers. The comprehensive evaluation included a range of prominent vision language models: GPT4o, Claude 3, Gemini Pro Vision, miniGPT, Qwen-VL, InstructBLIP, BLIP2, and LLaVA-1.5. To varying degrees, all of them exhibited this tendency to perceive illusions where none were present, with none matching human performance.

The three leading commercial models tested – GPT-4, Claude 3, and Gemini 1.5 – were capable of recognizing actual visual illusions but simultaneously misidentified illusion-illusions. Other models, such as miniGPT, Qwen-VL, InstructBLIP, BLIP2, and LLaVA-1.5, showed more mixed results. However, Ullman cautions against interpreting this as superior resistance to self-deception. Instead, he attributes their varied performance to generally lower visual acuity, suggesting these models are simply less capable at image recognition across the board, rather than being immune to perceiving non-existent illusions. The data supporting Ullman’s findings has been made publicly available.

Ullman further clarifies that this behavior isn’t directly analogous to human apophenia (seeing patterns in random data) or pareidolia (perceiving meaningful images in ambiguous stimuli). He also distinguishes it from the commonly used AI term “hallucination,” which he believes has lost its precise meaning, often simply referring to any model mistake. Instead, Ullman suggests the AI’s error is more akin to a human cognitive shortcut: misidentifying a new problem as a familiar one and applying an inappropriate solution. It’s as if the machine falsely identifies an image as an illusion and proceeds based on that incorrect premise.

Regardless of the precise terminology, Ullman emphasizes that this disconnect between vision and language in current AI models warrants close scrutiny, particularly given their increasing deployment in critical applications like robotics and other AI services. While acknowledging ongoing research into these limitations, he underscores the profound concern should these systems be relied upon with the assumption that their visual and linguistic components seamlessly integrate. The consensus among serious researchers, he notes, is a resounding call for continued, deeper investigation into these fundamental misinterpretations.

Vision AI Models See Illusions Where None Exist

Related Articles

New Benchmark: Inclusion Arena Ranks LLMs by Real-World Use

New Brain Implant Decodes Inner Monologue Using AI

Optimizing Agentic AI: Silver Bullet Workflows for Speed & Accuracy

Related Articles

▸
New Benchmark: Inclusion Arena Ranks LLMs by Real-World Use

▸
New Brain Implant Decodes Inner Monologue Using AI

▸
Optimizing Agentic AI: Silver Bullet Workflows for Speed & Accuracy