TPC25 Maps Road to Science-Ready AI: Exascale, Quantum, & Future Plans
The TPC25 conference recently convened leading researchers with a unified objective: to transform frontier-scale artificial intelligence into a practical tool for scientific discovery. Discussions throughout the week highlighted both the immense promise and the significant hurdles that lie ahead in this ambitious endeavor.
Beyond Raw Speed: Crafting Science-Ready AI
Satoshi Matsuoka, Director of RIKEN’s Center for Computational Science, emphasized that today’s commercial foundation models are merely a starting point for scientific applications. Speaking on behalf of RIKEN’s AI for Science Team, Matsuoka detailed critical gaps in data handling, model design, and workflow orchestration that must be addressed before large language and learning models can reliably serve scientific research.
RIKEN is actively building the infrastructure to support this vision. While its Fugaku supercomputer remains a global leader with 60,000 CPU nodes, the center is expanding its capabilities with a new GPU complex featuring approximately 1,500 Nvidia Blackwell accelerators and hundreds of additional GPUs and TPUs. RIKEN also operates three quantum systems and is planning a future system expected to achieve zettascale computing (10^21 operations per second) by 2029.
Matsuoka stressed that raw computational speed alone is insufficient. Scientific AI models must inherently understand complex scientific data and workflows. Unlike general-purpose commercial models, scientific applications in physics, chemistry, and biology require specialized features. Scientific data often combines text, equations, images, and sensor streams, frequently at terabyte scales. Current AI systems struggle with domain-specific symbols, units, very long sequences, and ultra-high-resolution scientific images. To overcome this, Matsuoka advocated for custom token vocabularies, sparse attention mechanisms, and physics-aware decoders capable of handling context windows far beyond typical limits.
RIKEN is exploring practical methods to improve model efficiency and data comprehension, including advanced data compression techniques like quad-tree tiling and space-filling curves for high-resolution images. These methods offer substantial computational savings without sacrificing accuracy but require new compiler and memory support. For multimodal data, the team is developing hybrid operators that combine neural networks with traditional partial differential equation solvers. Matsuoka also highlighted a shift from monolithic, enormous models to a more diverse spectrum of task-tuned models, including mixture-of-experts architectures and fine-tuned domain models, emphasizing reasoning during inference to reduce costs and enhance robustness.
Unlocking Discovery with Generative Quantum AI
Steve Clark, Head of AI at Quantinuum, explored the transformative potential when quantum computing and AI converge. He outlined Quantinuum’s strategy for “generative quantum AI,” focusing on three synergistic approaches.
First, AI is being leveraged to optimize quantum computing itself. Machine learning techniques, such as deep reinforcement learning, are applied to challenges like quantum circuit compilation, reducing the number of costly quantum gates, and improving optimal control and error correction on actual quantum hardware.
Second, Clark’s team is investigating how quantum systems can power entirely new forms of AI. This involves redesigning neural networks to operate natively on quantum hardware, utilizing quantum properties like superposition to process information in fundamentally different ways, creating models with no direct classical analogue.
Third, the strategy involves training AI models on data generated by quantum computers. This allows the AI to learn patterns that classical systems cannot produce. An example is the Generative Quantum Eigensolver, where a transformer model iteratively suggests quantum circuits to find a molecule’s ground state, a method applicable to chemistry, materials science, and optimization.
AI’s Mainstream Ascent in HPC, Yet Challenges Persist
Earl C. Joseph, CEO of Hyperion Research, presented survey findings highlighting AI’s rapid integration into high-performance computing (HPC) environments. AI adoption in HPC has surged from roughly one-third of sites in 2020 to over 90% by 2024, moving beyond experimental stages into mainstream use for tasks like simulation enhancement and large-scale data analysis across government, academia, and industry.
This growth is closely paralleled by increasing cloud adoption, as organizations turn to cloud services to mitigate the high costs and rapid obsolescence of leading-edge hardware, particularly GPUs. The cloud offers access to current-generation hardware and greater flexibility, reducing the need for long-term on-premises investments.
Despite this expansion, significant barriers remain. The most frequently cited challenge is the quality of training data, which has stalled numerous AI projects. Joseph cited Mayo Clinic as an example of an organization that mitigates this risk by exclusively using its own vetted data to train smaller, high-quality language models. Other persistent issues include a shortage of in-house AI expertise, insufficient training data scale, and the inherent complexity of integrating AI into existing HPC environments. Joseph predicted that this complexity will drive the growth of a new market for domain-specific AI software and consulting services. Hyperion’s studies indicate that 97% of surveyed organizations plan to expand their AI use despite rising costs, underscoring the need for significant budget increases as AI infrastructure becomes more expensive.
Mitigating AI Risks with On-Premise Solutions
Jens Domke, team leader of the Supercomputing Performance Research Team at RIKEN, delivered a stark warning on the often-overlooked aspect of risk mitigation in the rush to deploy AI for scientific use. He outlined five key risk factors: human error, AI software vulnerabilities, supply chain weaknesses, inherent model risks, and external threats like legal issues and theft.
Domke provided examples of real-world incidents, including confidential data leaks from companies using cloud-based AI services and security breaches affecting major AI providers. He also highlighted how rapidly developed AI software often lacks robust security, citing instances where basic security protocols were overlooked. The complexity of modern AI workflows, which can involve dozens of software packages, further expands the attack surface.
In response to these pervasive risks, RIKEN is developing its own on-premise AI management capability, envisioning it as a secure, privatized alternative to commercial cloud AI offerings. This in-house solution aims to replicate the functionality of external services while eliminating the risks of data leakage, hacking, and data exfiltration.
RIKEN’s infrastructure will be built on open-source components and feature multi-tiered security enclaves. A semi-open tier will offer broad usability behind a secure firewall, similar to commercial services but within a controlled environment. Higher-security tiers will be reserved for highly confidential operations, such as medical or sensitive internal research. The core principle is “don’t trust anything,” with all models and services containerized, isolated on private networks, and accessed via secure reverse proxies. This approach provides RIKEN with full control over its data and models, allowing for easy integration of open-source models and custom fine-tuned models without external restrictions.
The Road Ahead
Across the diverse perspectives shared at TPC25, a consistent message emerged: raw computational scale alone is not enough to realize the full potential of scientific AI. The path forward demands domain-tuned models, seamless hybrid classical-quantum workflows, rigorous data quality standards, and robust, proactive risk controls. The coming year will be crucial for translating these insights into shared tools and community standards. If the momentum demonstrated at TPC25 continues, the scientific community will move closer to AI systems that accelerate discovery without compromising trust.