TPC25: AI's Expanding Role in Science, Multimodal Data & Non-LLMs
Last week’s TPC25 conference brought to the forefront a series of critical questions shaping the future of artificial intelligence, moving beyond the hype of large language models to tackle fundamental challenges in data, evaluation, and accountability. A plenary session featuring four distinguished speakers delved into how to accelerate scientific discovery with AI while maintaining control, the intricate task of tracing a language model’s outputs back to its training data, the complex notion of fairness when AI interprets maps instead of text, and the unprecedented ways these sophisticated systems can fail.
Prasanna Balaprakash, director of AI Programs at Oak Ridge National Laboratory (ORNL), illuminated the institution’s long-standing engagement with AI, stretching back to 1979. He highlighted ORNL’s historical role in pioneering AI for scientific applications, from early rule-based expert systems to housing powerful supercomputers like Titan and the current Frontier, equipped with tens of thousands of GPUs. Today, ORNL’s AI initiative prioritizes building assured, efficient AI models for scientific simulation, experimental facilities, and national security. This involves developing robust methods for validation, verification, uncertainty quantification, and causal reasoning, alongside strategies for scaling large models on supercomputers and deploying smaller models at the edge. Balaprakash emphasized ORNL’s focus on non-traditional modalities, such as large-scale spatiotemporal data crucial for nuclear fusion simulations, leading to breakthroughs like the Oak Ridge Base Foundational Model for Earth System Predictability, which achieved exascale throughput and models with up to 10 billion parameters—a first for this type of data. He also detailed efforts in large-scale graph foundation models for materials science and the integration of AI with experimental instruments, enabling real-time data processing and intelligent steering of experiments to optimize resource use.
Shifting focus to the inner workings of large language models, Jiacheng Liu of the Allen Institute for AI (AI2) introduced OLMoTrace, an innovative system designed to open the “black box” of LLMs. This tool, integrated within AI2’s open OLMo model family, allows users to trace an LLM’s generated response directly back to the specific segments of its multi-trillion-token training dataset. Utilizing an optimized indexing system, OLMoTrace quickly identifies exact matches between model outputs and their source documents, making it possible to fact-check information, understand the provenance of a model’s answer, and even expose the roots of “hallucinations”—instances where models generate fabricated content. Liu demonstrated how the system revealed that a model had learned to produce fake code execution results from training dialogues where students provided outputs without actually running the code. For researchers and practitioners, this level of transparency is invaluable for auditing model behavior, ensuring compliance with emerging AI governance rules, and complementing mechanistic interpretability studies by linking high-level behaviors to underlying data.
A more sobering perspective on AI’s societal impact came from Ricardo Baeza-Yates, director of the BSC AI Institute, who delivered a critical overview of what he terms “Irresponsible AI.” He argued that current AI systems are prone to failures such as automated discrimination, the spread of misinformation, and resource waste, often because they are treated as mirrors of human reasoning rather than mere predictive engines. Baeza-Yates cautioned against anthropomorphizing AI with terms like “ethical AI,” asserting that ethics and trust are inherently human qualities, and attributing them to machines deflects responsibility from their human designers. He highlighted the escalating harms of generative AI, from disinformation to copyright disputes and mental health concerns, citing tragic cases where chatbots were implicated in suicides. He underscored the danger of “non-human errors”—mistakes AI makes that humans would not, for which society is ill-prepared. Baeza-Yates contended that measuring AI success solely by accuracy is insufficient; instead, the focus should be on understanding and mitigating mistakes. He also challenged the narrative of AI democratization, pointing out that linguistic and digital divides effectively exclude a vast portion of the global population from accessing leading AI models.
Finally, Dr. Kyoung Sook Kim, Deputy Director at Japan’s National Institute of Advanced Industrial Science and Technology (AIST), addressed the critical issue of fairness in geospatial AI (GeoAI). As GeoAI increasingly interprets satellite imagery, urban infrastructure, and environmental data for applications like disaster response and city planning, ensuring equitable outcomes becomes paramount. Dr. Kim explained that unlike text or image AI, geospatial systems face unique challenges in defining and measuring fairness. Uneven data collection, gaps in spatial coverage, and biased assumptions during model training can lead to skewed results, particularly impacting resource allocation and planning decisions. Fairness in GeoAI, she argued, cannot be a one-size-fits-all solution but must account for regional differences, population variations, and the quality of available data. She stressed the importance of scrutinizing early design decisions—how data is selected, labeled, and processed—to prevent bias from being embedded into the systems. Dr. Kim advocated for shared frameworks and international standards, including new ISO efforts, to establish consistent definitions of fairness and data quality, recognizing that the contextual nature of geography, history, and social complexity demands a nuanced approach to building and applying these powerful systems.
Collectively, these discussions at TPC25 signaled a significant evolution in AI research. As models grow in complexity and scale, the emphasis is shifting from mere performance benchmarks to a deeper understanding of data provenance, rigorous output evaluation, and the real-world impact of AI. The future of AI, these experts agreed, hinges not just on smarter algorithms, but on how responsibly and inclusively they are designed, built, and deployed.