Deepmind has introduced AlphaGenome, a new AI model designed to predict how even small changes in DNA can influence gene activity. The model focuses on the non-coding regions of DNA - stretches that do not contain direct blueprints for proteins but instead act as regulatory control centers, determining when and how genes are switched on or off. These regions make up the bulk of the human genome and have long been difficult to interpret.
AlphaGenome analyzes up to a million DNA letters in one pass, zeroing in on these non-coding segments, which account for about 98 percent of human DNA. These regions are packed with disease-related variants, but until now, they have been notoriously hard to decode. Unlike coding regions, which provide instructions for building proteins, non-coding sections play a key role in regulating gene activity.
The model predicts a range of molecular properties for every position in a DNA sequence, including where genes start and end, how much RNA is produced, and where certain proteins are likely to bind. It also identifies splicing sites - points where RNA is cut and rejoined during gene expression. Mistakes in this process can lead to serious disease.
AlphaGenome makes its predictions at single-base resolution, covering hundreds of cell types and tissues. Deepmind combined several AI techniques to achieve this: convolutional layers spot short DNA patterns, transformers handle long-range dependencies, and additional layers bring everything together to generate predictions.
One model, many tasks
According to Deepmind, AlphaGenome outperforms existing models in 22 out of 24 benchmarks and beats specialized tools for predicting regulatory effects of genetic variants in 24 out of 26 cases. It's currently the only model that can forecast all tested molecular properties at once. Training data comes from large public research projects like ENCODE, GTEx, FANTOM5, and 4D Nucleome, which provide experimental data on gene regulation across different cell types.
A key feature is how efficiently AlphaGenome assesses genetic variants: it compares predictions for mutated and non-mutated sequences and summarizes the differences for each property. The model can also pinpoint splice junctions directly from DNA, which could move genetic disease research forward.
Applications in disease and basic research
Deepmind says AlphaGenome could help researchers better understand the genetic roots of disease. In one example, the model analyzed a mutation seen in T-cell acute lymphoblastic leukemia (T-ALL) and correctly predicted that the mutation would create a new binding site for the MYB protein, activating a nearby cancer gene - a known disease mechanism.
Beyond disease research, AlphaGenome could be useful in synthetic biology, such as designing DNA sequences for targeted gene regulation. It can also help pinpoint functional genome elements that control specific cell types.
Not a clinical tool - yet
For now, AlphaGenome is only available for non-commercial research via an API. Deepmind stresses that the model was not developed or validated for clinical use. It cannot fully capture complex disease processes shaped by development or environment, and its ability to predict effects from distant regulatory elements - more than 100,000 DNA bases away - is still limited.
Still, Deepmind sees room for growth: with more training data, AlphaGenome could expand to cover additional species, cell types, or molecular processes. The architecture is flexible and scalable, according to the research team.
AlphaGenome predicts how changes in non-coding DNA affect gene regulation, offering insight into regions long considered a mystery. | Image: Deepmind