GPT-5 Leads Legal AI Benchmarks, Nears 'Last Mile' Progress
OpenAI’s GPT-5 model has achieved a significant milestone in the realm of legal artificial intelligence, scoring an impressive 89.22% on Harvey’s “BigLaw Bench” evaluation system. This performance marks GPT-5 as the best-performing OpenAI model assessed by Harvey, a leading generative AI pioneer in the legal tech sector.
Launched last year, Harvey’s BigLaw Bench was designed to rigorously gauge the quality of generative AI responses, specifically evaluating how closely they align with the expectations of a legal professional. The system employs custom-designed rubrics to assess two critical dimensions: “Answer Quality,” which scrutinizes the completeness, accuracy, and appropriateness of the model’s response for effective task completion; and “Source Reliability,” which evaluates the AI’s capacity to provide verifiable and correctly cited sources for its assertions, thereby enhancing trust and facilitating validation. Scores are meticulously calculated by accumulating positive points for meeting task requirements and deducting points for errors or missteps, such as AI hallucinations, with the final result expressed as a percentage.
GPT-5’s score of 89.22% represents a notable advancement, showing an improvement of approximately 5% over the next closest OpenAI model, o3, which scored 84.13%. While Harvey evaluates models from various companies, these specific comparative results highlight OpenAI’s progress. This level of performance is beginning to approach what industry experts term “last mile” territory in AI development. This refers to the final, most challenging stage where AI outputs are so refined and reliable that legal professionals can confidently approve them for direct use with minimal human intervention. Achieving initial, somewhat accurate results is relatively straightforward for many large language models, but pushing past the 90% threshold and into this “last mile” towards 99% accuracy is a fundamentally different and much more arduous challenge.
Despite the inherent difficulties, progress is undeniably being made at an incredible pace. While new generative AI models will certainly see incremental improvements, larger leaps in performance may come from other strategic enhancements, such as improving the underlying verification layers. The journey towards near-perfect accuracy, perhaps 99.9%, is likely still years away, mirroring the complexities encountered in fields like autonomous driving where achieving a high degree of success in unstructured environments is incredibly difficult but ultimately achievable with sustained investment. The legal sector’s rapid evolution over the past three years, shifting from widespread skepticism about AI to a majority of large law firms and their clients deeply engaging with the technology, underscores the transformative impact of these improving model performances. Without the tangible gains delivered by large language models, such enthusiastic adoption of legal AI tools would not have materialized.
Harvey plans to leverage GPT-5’s enhanced capabilities by integrating them into its systems to enable more powerful use cases, particularly in document drafting and complex research. GPT-5 stands out as the first orchestration model capable of combining multiple tasks, allowing a single AI agent to both collaborate with a user on research and produce a finished work product. For instance, in a complex scenario like identifying inconsistencies between internal guidance documents and current regulations across the United States and the European Union, GPT-5 can orchestrate various agents. These agents could review internal documents for relevant trends, find recent changes in global regulations, perform a comprehensive gap analysis, and then draft a memo outlining recommendations for updating internal guidance to ensure regulatory alignment—all while prompting the user for additional context as needed.
Coupled with recent data partnerships with legal information giants LexisNexis and iManage, Harvey’s systems can now access a comprehensive view of both public and proprietary legal data before acting. This holistic data access, combined with GPT-5’s substantially improved tool-use and drafting abilities, facilitates the construction of a deeply integrated AI system that can reason over an organization’s internal data and leverage trusted third-party content in real-time. This advancement brings Harvey closer to its core mission: creating an “intelligent coworker” capable of navigating the dynamic, iterative, and collaborative nature of complex legal matters.