- Frogomo AI
- Posts
- DeepMind Just Cracked the "Junk" DNA Problem
DeepMind Just Cracked the "Junk" DNA Problem
And Nobody's Talking About It

Hey friend,
Remember when scientists called 98% of your DNA "junk"?
For decades, that was the official position. Only about 2% of your genome codes for proteins - the stuff that actually builds your body.
The rest? Filler. Evolutionary leftovers. Biological noise.
Turns out that was spectacularly wrong.
That "junk" DNA actually controls everything. When your genes turn on. When they turn off. How much protein gets made? Where in your body does it happen?
It's not junk. It's the control panel.
The problem? We had no idea how to read it.
Until now.
(This is where DeepMind enters the chat.)
What AlphaGenome Actually Does
DeepMind just released a model that predicts the function of DNA sequences - specifically that mysterious 98% that doesn't code for proteins.

You feed it up to 1 million DNA base pairs.
It tells you what those sequences actually do.
You can access the model here: AlphaGenome
Gene expression levels. Splicing patterns. Chromatin features. 3D DNA structure. Thousands of molecular properties, predicted at single-nucleotide resolution.
One prediction takes about one second.
(One second. For a million base pairs. I had to sit with that for a minute.)

To understand why this matters, you need to know about a problem that's been haunting genetics for years.
The Problem This Solves
Scientists have been running genome-wide association studies (GWAS) for a while now. They compare DNA from sick people to DNA from healthy people and look for differences.
These studies work. They find thousands of genetic variants associated with diseases.
Here's the catch: most of those variants sit in non-coding regions.
The "junk" DNA.
So researchers would identify a suspicious mutation and then... have no efficient way to figure out if it actually did anything. Does this variant cause the disease? Or is it just sitting nearby while something else causes the problem?
They were essentially guessing.
Running experiments on every candidate would take lifetimes. There are millions of variants to check.
AlphaGenome changes this equation completely.
Feed in the normal sequence. Feed in the mutated sequence. Compare the outputs across 7,000+ genomic tracks.
Now you know which variants are actually doing something versus which ones are just along for the ride.
(Imagine having a list of a thousand suspects and finally getting a tool that can narrow it down to five.)

Example of AlphaGenome track predictions and detailed performance evaluations
The Technical Bit (Quick Version)
The architecture combines two approaches:
Convolutional neural networks - these detect short local patterns in the DNA sequence
Transformers - these capture long-range interactions across the entire million-base-pair input
(Yes, the same transformer architecture behind GPT. Turns out it's useful for more than chatbots.)
Training runs across multiple TPUs. The model outperformed specialised tools on 22 of 24 DNA sequence tasks and matched or beat top models on 25 of 26 variant-effect evaluations.
It's not a marginal improvement. It's a generational leap in capability.
What This Unlocks
Cancer research
Tumour genomes are messy. Hundreds or thousands of mutations. But only some of them are actually driving the cancer - the rest are "passengers" that happened along the way but don't matter.
Figuring out which is which has been a massive bottleneck. AlphaGenome gives researchers a way to prioritise.
Rare disease diagnosis
Here's a stat that surprised me: cryptic splice variants cause roughly 9-11% of pathogenic mutations in rare genetic disorders.
These are mutations that mess up how genes get processed, not the genes themselves. They're notoriously hard to identify. AlphaGenome predicts splicing directly from the DNA sequence.
(For families stuck in diagnostic odysseys, this could be the difference between answers and endless uncertainty.)
Gene therapy design
If you want to engineer DNA to treat a condition, you need to predict how your modifications will affect gene regulation.
Previously, this was a lot of trial and error. Expensive trial and error.
Now you can model it computationally before touching a single cell.
Drug development
Pharmaceutical companies have millions of candidate variants to investigate. Lab time is expensive. AlphaGenome helps narrow down which ones are actually worth the investment.
What It Doesn't Do
I want to be clear about the limitations because the hype cycle around AI in biology can get out of control.
This is not a crystal ball for your health.
It doesn't predict complex diseases involving multiple genes, environmental factors, or higher-order biological interactions. Your risk of heart disease or diabetes involves way more than any single model can capture.
It doesn't work well for very long-range regulation - beyond about 100,000 base pairs, the predictions get unreliable.
And importantly, it's explicitly not validated for clinical use. This is a research tool for scientists. Not a diagnostic you'd use to make medical decisions.
(If someone tries to sell you a "health report" based on AlphaGenome, run.)
The DeepMind Pattern
This follows a clear trajectory:
AlphaFold (2021) - Predicted protein structures. Won the 2024 Nobel Prize in Chemistry.
AlphaMissense (2023) - Predicted the effects of mutations in the 2% of your genome that codes for proteins.
AlphaGenome (2025) - Predicts the function of the other 98%.
Each tool chips away at a different piece of the puzzle, connecting your DNA to what actually happens in your body.
The "genotype to phenotype" problem, as scientists call it.
DeepMind is systematically solving it, piece by piece.
(Whether you find that exciting or slightly terrifying probably depends on your general feelings about powerful AI in the hands of large corporations.)
Current Adoption
Since the preview launch in June 2025:
→ About 3,000 scientists across 160 countries have used it
→ Roughly 1 million API requests per day
→ Free for non-commercial research
→ Full Nature paper published January 2026
The research community is not sleeping on this.
Why I Think This Matters Beyond Biology
We're watching AI systematically unlock domains that were previously intractable.
Protein folding was considered one of the hardest problems in biology. AlphaFold cracked it.
Understanding non-coding DNA was another one. AlphaGenome just made a massive dent.
The pattern isn't "AI is good at games and chatbots." The pattern is "AI is becoming genuinely useful for hard scientific problems that humans couldn't brute-force."
Drug discovery. Materials science. Climate modelling. The list of domains where this approach might work keeps growing.
(I'm not saying AI solves everything. I'm saying the ceiling on what it can contribute keeps rising faster than most people expected.)
98% of your DNA isn't junk.
The instructions we couldn't read.
Now we're starting to read them.
The implications for medicine, for understanding disease, for designing therapies - they're going to unfold over years and decades.
But the foundation just got built.
If this was interesting, share it with someone who still thinks 98% of DNA is junk.
More AI breakdowns every week. Subscribe if you haven't.
See you next time.
Reply