OpenAI's GPT-5: Medical AI Benchmarks & Safety Focus

Techrepublic

OpenAI has officially unveiled GPT-5, its latest and most advanced large language model, marking a significant stride in the evolution of artificial intelligence. Released on August 7, 2025, the new model aims to fundamentally reshape how users interact with AI by unifying a suite of capabilities that previously required switching between specialized tools. GPT-5 is heralded as a more intelligent, faster, and more reliable system, demonstrating state-of-the-art performance across diverse domains including coding, mathematics, writing, and visual perception.

A core focus of GPT-5’s development has been the mitigation of AI hallucinations and deceptive behaviors, concerns that have long plagued generative AI models. OpenAI reports substantial progress in this area. GPT-5 is approximately 45% less prone to factual errors compared to its predecessor, GPT-4o. Crucially, when operating in its “thinking” mode—a feature that allows the model to engage in deeper analysis for complex queries—GPT-5 is about 80% less likely to produce factual inaccuracies than OpenAI’s o3 model. Furthermore, efforts to curb intentional deception have yielded promising results, with the model’s deception rate significantly reduced in testing scenarios. OpenAI emphasizes that GPT-5 is better equipped to recognize its own limitations and communicate them clearly to the user, enhancing overall reliability. New safety protocols are also in place to handle sensitive prompts, with GPT-5 designed to offer comprehensive yet safe responses rather than outright refusal.

Perhaps one of the most impactful advancements in GPT-5 is its enhanced performance in the healthcare sector, alongside the introduction of specific mental health guidelines. OpenAI touts GPT-5 as its most capable model to date for health-related inquiries, designed to empower users with information and support for their well-being. The model has demonstrated exceptional performance on HealthBench, a rigorous evaluation benchmark developed with the input of 250 physicians. This benchmark, which includes 5,000 simulated conversations, assesses the AI’s ability to navigate realistic clinical scenarios. On the HealthBench Hard Hallucinations test, GPT-5 achieved a remarkably low hallucination rate of just 1.6%, an eightfold improvement over previous models.

Beyond its diagnostic and informational capabilities, GPT-5 is engineered to function as an “active thought partner” in health discussions, proactively identifying potential concerns and asking clarifying questions to deliver more helpful answers. Early real-world applications by organizations like Amgen, a biotechnology company leveraging GPT-5 for drug design, and Oscar Health, a health insurer utilizing it for clinical reasoning and policy mapping, underscore its practical utility in complex medical contexts.

In response to growing scrutiny over AI’s societal impact, particularly concerning user dependency and psychological effects, OpenAI has integrated specific mental health guidelines into GPT-5. The model can now detect signs of emotional distress in conversations, such as persistent negative sentiment, and respond by providing evidence-based resources. It also employs custom rubrics to evaluate user inputs, enabling it to suggest pauses or guide discussions towards more positive outcomes. While these advancements are aimed at fostering healthier interactions, the ethical implications of AI venturing into therapeutic territories without proper human oversight remain a critical point of industry discussion, emphasizing the ongoing need for trust, safety, and transparency in large language models, especially in sensitive domains like healthcare.