Google's Med-Gemini AI Invents Body Part, Sparks Safety Concerns

Theverge

Google’s healthcare artificial intelligence model, Med-Gemini, recently generated a non-existent anatomical structure in a diagnostic report, an error that medical experts are highlighting as a critical demonstration of the risks associated with deploying AI in clinical settings. The incident, initially downplayed by Google as a “typo,” has ignited a broader discussion about AI “hallucinations” and patient safety.

The specific error appeared in a 2024 research paper introducing Med-Gemini, where the AI diagnosed an “old left basilar ganglia infarct.” Board-certified neurologist and AI researcher Bryan Moore identified that “basilar ganglia” is a conflation of two distinct brain structures: the “basal ganglia,” which aids motor control and learning, and the “basilar artery,” which supplies blood to the brainstem. Conditions affecting these areas require vastly different treatments. Moore flagged the mistake to Google, which subsequently made a quiet, unacknowledged edit to its accompanying blog post, changing “basilar ganglia” to “basal ganglia.” Following public scrutiny from Moore, Google reverted the blog post change but added a clarifying caption, attributing the error to a “common mis-transcription” learned from training data. Crucially, the original research paper, co-authored by over 50 individuals and peer-reviewed, remains uncorrected.

Med-Gemini is a suite of AI models designed to assist medical professionals by summarizing health data, generating radiology reports, and analyzing electronic health records. Google initially promoted it as a “leap forward” with “substantial potential” in various medical fields. While still in early trials, with its “trusted tester program” likely expanding into real-world pilot scenarios, the implications of AI errors are escalating.

Medical professionals are expressing profound concern over such inaccuracies. Maulin Shah, Chief Medical Information Officer at Providence, a large healthcare system, described the error as “super dangerous,” emphasizing the critical difference a few letters can make in a medical context. He highlighted the risk of AI propagating incorrect information, citing an example where AI could spread a human-made error from medical notes, leading to decisions based on flawed data. Google spokesperson Jason Freidenfelds stated that the company partners with the medical community and is transparent about its models’ limitations, calling the specific error a “clarification” of a “missed pathology.”

The issue extends beyond Med-Gemini. Another Google healthcare model, MedGemma, recently demonstrated inconsistencies. Dr. Judy Gichoya, an associate professor at Emory University School of Medicine, found that MedGemma’s diagnostic accuracy varied significantly based on how questions were phrased. A detailed query might yield a correct diagnosis, while a simpler one for the same image could result in a “normal” assessment, missing critical issues like pneumoperitoneum (gas under the diaphragm).

Experts worry that the general accuracy of AI systems could lead human medical professionals to become complacent, a phenomenon known as automation bias. Dr. Jonathan Chen of Stanford School of Medicine described this as a “very weird threshold moment” where AI tools are being adopted too quickly, despite their immaturity. He stressed that even if AI sometimes performs well, its seemingly authoritative but incorrect outputs can be highly misleading.

The consensus among medical experts is that AI in healthcare must be held to a significantly higher standard than human error rates. Shah advocates for “confabulation alerts”—AI systems designed to identify and flag potential hallucinations by other AI models, either by withholding the information or issuing warnings. Gichoya noted that AI’s tendency to “make up things” rather than admitting “I don’t know” is a major problem in high-stakes fields like medicine. Dr. Michael Pencina, chief data scientist at Duke Health, views the Med-Gemini error more as a hallucination than a typo, underscoring the serious consequences of such mistakes in high-risk applications. He likened the current stage of AI development to the “Wild West.”

While acknowledging the potential benefits, experts like Chen caution against blindly trusting AI, comparing it to a driverless car analogy where complacency leads to danger. They emphasize that while AI can augment healthcare, it should not replace critical human oversight. The incident with Med-Gemini highlights the urgent need for more rigorous testing, transparent error correction, and a cautious, deliberate approach to integrating AI into clinical practice, where even “imperfect can feel intolerable.”

Google's Med-Gemini AI Invents Body Part, Sparks Safety Concerns - OmegaNext AI News