DeepMind Open Sources Aeneas AI for Ancient Text Analysis
Google DeepMind has introduced Aeneas, an innovative generative AI model designed to assist historians in deciphering and understanding ancient inscriptions. Released as an open-source tool, Aeneas can process both text and image inputs, demonstrating a significant leap in its ability to restore missing characters in damaged historical texts, outperforming existing state-of-the-art models.
Aeneas is specifically engineered to support epigraphy, the specialized study of ancient inscriptions carved on stone, metal, or other durable materials. The model streamlines several critical tasks for historians: accurately dating inscriptions, identifying their geographical origin, reconstructing partial or fragmented texts, and finding “parallels”—other inscriptions or texts that contain similar words or phrasing. At its core, Aeneas leverages a sophisticated multimodal transformer architecture, equipped with specialized components tailored for each of these analytical functions. When benchmarked against leading AI models and even human experts in various epigraphic challenges, Aeneas consistently delivered superior results. Notably, when human historians utilized Aeneas as a collaborative tool, their combined performance surpassed either human-only or AI-only efforts, underscoring the power of human-AI partnership.
DeepMind envisions Aeneas as a flexible tool capable of adapting to a wide array of ancient languages, scripts, and media, extending its utility beyond stone inscriptions to include papyri and coinage. This adaptability aims to facilitate connections across a broader spectrum of historical evidence and is part of a larger initiative to explore how generative AI can enhance the identification and interpretation of historical parallels on a vast scale. To ensure its benefits reach a wide audience, an interactive version of Aeneas has been made freely available to researchers, students, educators, and museum professionals.
Aeneas represents a significant evolution from DeepMind’s earlier Ithaca project, a text-only model focused solely on ancient Greek epigraphy. While Ithaca laid foundational groundwork, Aeneas introduces crucial advancements, including the capacity for image input, the ability to reconstruct inscriptions with an unknown number of missing characters, and the capability to directly output identified parallels—features absent in its predecessor.
To train Aeneas, DeepMind meticulously compiled the Latin Epigraphic Dataset (LED), an extensive corpus comprising 176,861 inscriptions. This massive dataset was created by starting with existing source materials and then employing a complex pipeline to clean, standardize, and integrate the records into a unified format. The inscriptions within the LED span a vast historical period, from the 7th century BCE to the 8th century CE, and originate from diverse regions across the Roman world, stretching from Britain to Mesopotamia.
To validate Aeneas’s effectiveness as a research instrument, DeepMind conducted a study involving 23 epigraphic experts. These specialists used Aeneas within a simulated real-world research environment, complete with time constraints. The study revealed that while human experts manually selected parallels for inscriptions, they frequently incorporated at least one additional parallel suggested by Aeneas. One researcher highlighted the profound impact of the tool, stating that the parallels retrieved by Aeneas completely shifted their historical focus, transforming a task that would typically take days into a mere 15 minutes. This efficiency gain, the researcher noted, would free up significant time for deeper analysis and the framing of research questions.
While Aeneas offers remarkable capabilities, discussions surrounding the model have also highlighted inherent complexities within ancient history research. Some observers point out that even with advanced AI, historical interpretations often involve “educated guesses” based on incomplete or partially corrupted information. They note that historical data, even from well-documented periods, inherently carries “data quality issues” due to biases and perspectives of original authors. Aeneas, therefore, serves as a powerful aid in navigating these challenges, providing robust insights while acknowledging the interpretive nature of historical inquiry. The Aeneas code and an interactive demo are publicly accessible for further exploration and use.