For years, predictive biology was sold as a modeling breakthrough. The pitch was simple:
if we gather massive biological datasets and train large models on them, we can predict what will happen inside a cell, design better molecules, and speed up discovery. Models would explore the search space. Labs would validate. Drug discovery would shift from trial-and-error to engineering.
That story held up for a while.
But the field is now facing a harder truth:
Prediction isn’t the bottleneck anymore.
Intervention is.
And that distinction is reshaping the entire commercial landscape.
Prediction works, until the system moves
Today’s models are good at recognizing patterns in biological data.
They can:
classify cell states
group similar molecules
map correlations across modalities
But once the system changes, most models fall apart.
Two genes can correlate perfectly and behave differently when perturbed.
A molecule that looks ideal in silico can fail immediately in vitro.
A model that excels on static benchmarks can’t predict how a cell will respond to a real-world intervention.
Predicting what something is is fundamentally different from predicting what will happen when you change it.
Biology depends on interventions.
Most models weren’t built for that.
They describe patterns, but describing is not understanding — and understanding is what discovery requires.
The missing foundation: causality
The real value in predictive biology is in answering counterfactuals:
What happens if we knock down this gene?
What happens if we add this molecule?
What happens if we combine these signals?
Most commercial models cannot answer these reliably.
They are excellent interpolators, but weak causal reasoners.
They treat biology like a static dataset rather than a dynamic, adaptive system.
Closing this gap requires:
causal inference frameworks
high-quality perturbation datasets
models trained on experiments designed for interventions
This is why the field is moving away from pure software companies and toward organizations that blend modeling with large-scale, automated experimentation.
Without causality, predictive biology remains descriptive.
With causality, it becomes actionable.
The industrial loop: the emerging competitive advantage
A small subset of companies has recognized this and built the entire loop:
generate data
build models
design molecules
run automated experiments
feed the results back into the system
The loop is simple on paper, but brutally hard in practice.
It requires:
automated wet labs
robotics
multimodal phenotyping
generative design systems
capital to run thousands of experiments per week
These companies don’t look like software startups.
They look like biological factories.
And pharma has noticed.
Instead of trying to build these loops internally, pharma increasingly relies on companies that already operate them.
This shift is the core commercial consequence of predictive biology moving beyond models.
A fragmented ecosystem around the loop
Predictive biology companies now fall into four categories:
1. Integrated discovery engines
Recursion + Exscientia, Xaira
→ phenomics, generative design, chemistry automation, and huge experimental throughput
→ true end-to-end discovery engines
2. Model-first platforms
Isomorphic Labs, EvolutionaryScale, Profluent
→ state-of-the-art models
→ rely on partners for experiments
3. High-throughput specialists
Terray, Enveda, Iambic, BigHat, AbSci, LabGenius
→ tight automated loops for specific modalities (chemistry, proteins, metabolomics)
4. Infrastructure providers
Ginkgo, Strateos, Culture Biosciences
→ the backend: robotic labs, cloud bioreactors, automated execution
Capabilities vary widely.
Some companies run continuous automated loops.
Others rely on manual execution or patchy data.
Industrial biology exists — but only a handful operate it at true scale.
What changes next
The next phase of predictive biology depends on three foundations:
Mechanistic grounding
Models must reason about interventions, not just patterns.
Unified experimental scaffolds
Labs must generate reliable, standardized perturbation data at scale.
Fully integrated loops
Design and execution must operate as one continuous system.
These are not solved by bigger models or more compute.
They are solved by better feedback, tighter integration, and deeper biological grounding.
To conclude
Predictive biology began as a modeling revolution.
Now it is becoming an industrial one.
The field is transitioning from:
standalone AI platforms → integrated biological systems
static predictions → dynamic interventions
algorithms → loops
correlation → causality
isolated models → continuous, automated cycles
The companies that understand this shift are building the next generation of discovery.
The ones that don’t are still doing computational biology — just with larger models.

