AI's Biological Blind Spots: Gene Activity Prediction Falls Short
Artificial intelligence and machine learning have delivered some truly spectacular successes in the field of biology, from designing enzymes capable of digesting plastics to engineering proteins that can block snake venom. In an era of seemingly boundless AI hype, it might be tempting to assume that simply unleashing powerful algorithms on the immense datasets we’ve already amassed would lead to a comprehensive understanding of most biological processes, potentially allowing us to bypass labor-intensive experiments and the ethical complexities of animal research.
However, biology encompasses far more than just protein structures. It is exceedingly premature to suggest that AI can be equally effective at tackling all facets of this intricate science. This context makes a recent study particularly intriguing. Researchers evaluated a suite of AI software packages designed to predict how active genes would be in cells exposed to varying conditions. As it turns out, these sophisticated AI systems performed no better than a deliberately simplified prediction method. The findings serve as a crucial reminder that biology is incredibly complex, and success in developing AI systems for one specific biological aspect does not guarantee their general applicability across the field.
The study was spearheaded by a trio of researchers based in Heidelberg: Constantin Ahlmann-Eltze, Wolfgang Huber, and Simon Anders. They noted that several other studies, released while their work was in preprint, reached broadly similar conclusions. The Heidelberg team’s approach is particularly straightforward, making it an excellent illustration of the current limitations.
The AI software examined in their research aimed to predict changes in gene activity. While every cell contains copies of the approximately 20,000 genes in the human genome, not all of them are active at any given time. “Active” in this context refers to genes producing messenger RNAs (mRNA), which are crucial for cellular functions. Some genes are constantly active at high levels, providing essential functions, while others are only active in specific cell types, such as nerve or skin cells, or are triggered by particular conditions like low oxygen or high temperatures.
Over many years, scientists have conducted numerous studies to map the activity of every gene in various cell types under different conditions. These investigations range from using gene chips to identify which mRNAs are present in cell populations to sequencing RNAs from individual cells to pinpoint active genes. Collectively, this research has built a broad, though incomplete, picture linking gene activity to diverse biological circumstances. This vast repository of data could, in theory, be used to train an AI to predict gene activity under untested conditions.
Ahlmann-Eltze, Huber, and Anders specifically tested what are known as single-cell foundation models, which have been trained on this type of gene activity data. The “single-cell” designation indicates that the models learned from gene activity observed in individual cells, rather than averaged across cell populations. “Foundation models” implies they were trained on a wide range of data but require further fine-tuning for specific tasks.
The specific task for these models was to predict how gene activity might change when genes are intentionally altered. When a single gene is lost or activated, sometimes only that gene’s mRNA is affected. However, some genes encode proteins that regulate entire collections of other genes, leading to changes in the activity of dozens of genes. In other cases, altering a gene can impact a cell’s overall metabolism, resulting in widespread shifts in gene activity. The complexity escalates further when two genes are involved. Often, their effects are simply additive—the sum of changes caused by each individual alteration. But if their functions overlap, the outcome can be a synergistic enhancement of some changes, suppression of others, or entirely unexpected modifications.
To explore these intricate effects, researchers have historically used CRISPR gene-editing technology to intentionally alter the activity of one or more genes. They then sequence all cellular RNAs to observe the resulting changes. This approach, termed Perturb-seq, provides valuable insight into a gene’s function within a cell. For Ahlmann-Eltze, Huber, and Anders, it provided the crucial data needed to determine if their chosen foundation models could be trained to predict these downstream changes in other gene activities.
Starting with the pre-trained foundation models, the researchers conducted additional training using data from experiments where one or two genes were activated with CRISPR. This training dataset included information from 100 individual gene activations and 62 instances where two genes were activated simultaneously. The AI packages were then tasked with predicting the outcomes for another 62 pairs of activated genes. For comparison, the researchers also generated predictions using two remarkably simple models: one that always predicted no change in gene activity, and another that always predicted a simple additive effect (meaning activating genes A and B would produce the combined changes of activating A plus activating B).
The results were underwhelming. “All models had a prediction error substantially higher than the additive baseline,” the researchers concluded. This finding held true even when alternative measurements of AI prediction accuracy were used. The core of the problem appeared to be the trained foundation models’ inability to accurately predict complex patterns of change, particularly when the alterations of gene pairs produced synergistic interactions. “The deep learning models rarely predicted synergistic interactions, and it was even rarer that those predictions were correct,” the researchers stated. In a separate test focused specifically on these gene synergies, none of the AI models performed better than the simplified system that merely predicted no changes at all.
The overall conclusions from this work are unequivocally clear. As the researchers themselves wrote, “As our deliberately simple baselines are incapable of representing realistic biological complexity yet were not outperformed by the foundation models, we conclude that the latter’s goal of providing a generalizable representation of cellular states and predicting the outcome of not-yet-performed experiments is still elusive.” It is vital to underscore that “still elusive” does not imply an inability to ever develop AI capable of assisting with this problem. Nor does it mean these findings apply to all cellular states or, even less, to all of biology. However, the study provides a valuable caution at a time when there is immense enthusiasm for the idea that AI’s success in a few specific areas heralds a world where it can be universally applied.