Eli Lilly just opened access to AI models trained on decades of proprietary drug data. 1,300 biotech companies can now use them for free. That's not generosity. That's a signal about where value is actually going.

When pharma opens the vault
Fig. 1 THE SIGNAL

For decades, pharma companies guarded their internal data like state secrets. The datasets from millions of failed experiments, what worked, what didn't, which molecules were toxic, which ones couldn't be manufactured, were competitive advantages worth billions.

Lilly just opened the vault.

On January 8, 2026, Benchling announced a partnership with Lilly TuneLab that gives every company on their platform access to prediction models trained on decades of Lilly's proprietary research. Hundreds of thousands of molecules. ADME, safety, developability predictions. The same tools Lilly scientists use internally.

TuneLab launched in September 2025 as part of Lilly's Catalyze360 program. Smaller biotechs get access to sophisticated AI models. Lilly gets data back through federated learning. The models improve. Everyone benefits. At least that's the pitch.

"Lilly TuneLab was created to be an equalizer for smaller companies."

— Daniel Skovronsky, M.D., Ph.D., Chief Scientific Officer, Eli Lilly

The Benchling partnership scales this dramatically. Benchling is the dominant R&D platform in biotech. Over 1,300 companies use it, from scrappy startups to Merck, Moderna, and Sanofi. Now every one of them can run Lilly's prediction models directly in their workflows. Upload a molecule, get back predictions on whether it'll be absorbed, whether it's toxic, whether it can actually be manufactured. The kind of answers that used to require months of lab work.

Here's what caught my attention. When a top-tier pharmaceutical company starts giving away its discovery tools, it's signaling that the tools themselves are no longer the differentiator. They're table stakes.

"We're building the infrastructure that makes sophisticated AI accessible to the entire industry."

— Sajith Wickramasekara, CEO and Co-founder, Benchling

The strategic logic makes sense once you see it. Lilly isn't in the business of selling software. They're in the business of developing drugs. By democratizing discovery tools, they shift competition to where they still have the advantage: the capacity to develop, manufacture, and commercialize the resulting candidates.

It's the same playbook tech giants used when they open-sourced Kubernetes and PyTorch. Commoditize the infrastructure. Compete on the application.

The algorithm isn't the moat anymore
Fig. 2 THE CONTEXT

Lilly isn't an outlier here. The entire discovery layer is commoditizing, and it's happening faster than anyone expected.

In November 2024, Google DeepMind released AlphaFold 3's source code after a scientist backlash over restricted access. The community's message was clear: gate-keep this technology and we'll route around you.

They did. MIT released Boltz-1, an open-source model matching AlphaFold 3's accuracy with full commercial rights. No licensing fees. No IP restrictions. Any biotech with a laptop can use it.

The Baker Lab at University of Washington open-sourced RFdiffusion, which has become the industry standard for de novo protein design. Companies like Levitate Bio and Tamarind Bio now offer it as a commodity backend service. When a breakthrough AI model becomes a menu item in a SaaS platform, it stops being a source of competitive advantage.

TIME FROM PROPRIETARY MODEL RELEASE TO OPEN-SOURCE EQUIVALENT

~7 months

And falling.

Think about what this means for startups founded between 2020 and 2023 whose primary asset was a proprietary folding or generation algorithm. The moat they built their companies on? It's gone. Any algorithmic advantage gets arbitraged away within 6 to 12 months.

The question isn't "Can AI design a drug?" anymore. Multiple AI-generated molecules are in Phase 2 and 3 trials. Insilico Medicine just went public in Hong Kong. The technology works.

The question now: if everyone can design molecules, what actually determines who wins?

The bottleneck moved downstream
Fig. 3 THE SHIFT

Here's the paradox nobody talks about. AI has slashed discovery timelines. But the cost to develop a single drug has climbed to $2.3 billion. Phase 1 success rates have dropped to 6.7%.

How is that possible?

The digital output of discovery is vastly outpacing the physical throughput of development. We got really good at designing molecules on computers. We didn't get any better at making them, testing them in humans, or getting them approved.

Three bottlenecks now determine who wins:

1. Manufacturing

Over 50% of recent FDA Complete Response Letters cited manufacturing issues. Not efficacy problems. Not safety concerns. The drug worked fine. They just couldn't make it reliably.

Cell therapies are the extreme case. CAR-T manufacturing costs exceed $200,000 per patient. The process is manual, batch-variable, and fundamentally unscalable. You're not going to cure millions of cancer patients with an artisanal manufacturing process.

Cellares raised $380 million from Bristol Myers Squibb to build automated "smart factories" for cell production. Their Cell Shuttle platform is essentially a robot that does what rooms full of technicians used to do, but faster, cheaper, and more consistently. Ori Biotech is taking a different approach with compact, closed systems that can sit inside existing facilities.

The play here isn't discovering better therapies. It's building the factories that can actually produce them at scale.

2. Clinical Trials

Patient recruitment is the black hole of drug development. Finding patients who match increasingly specific biomarker criteria can take longer than running the trial itself. You design the perfect study, then spend two years trying to find 200 patients who qualify.

Deep 6 AI mines the messy parts of electronic health records, physician notes, pathology reports, the unstructured text that standard database queries can't search, to find patients hiding in plain sight. Phesi takes a different angle: they simulate trials before they start, using data from 100 million patients to identify protocols that will fail due to recruitment problems before you waste money learning that the hard way.

The bottleneck isn't running trials. It's finding the patients to put in them.

3. Regulatory

Nobody talks about this one, but the documentation burden is crushing. Clinical Study Reports, patient narratives, FDA submissions. Mountains of text that have to be written, reviewed, cross-referenced, and formatted perfectly.

Yseop demonstrated that generative AI can reduce patient narrative writing from 4 hours to 4 seconds. That's not a typo. Certara integrated their CoAuthor tool with Veeva's regulatory system to create end-to-end submission pipelines.

Here's the kicker: the FDA itself deployed an internal AI tool called "Elsa" to help reviewers analyze submissions. When the regulator adopts AI, the industry has to follow.

FDA Complete Response Letters citing manufacturing issues

50%+

The drug worked. They couldn't make it.

The winners of the next cycle won't be those who can design the billionth novel binder on a GPU. They'll be those who solve the physical layer problems: manufacturing, patient identification, regulatory automation.

The counter-argument
Fig. 4 THE VENTURE

Not everyone agrees discovery is commoditized.

Xaira Therapeutics raised over $1 billion betting the opposite. Their thesis: public data is observational and biased toward what already exists. Real discovery requires data on what happens when you intervene.

Xaira built an industrial Perturb-seq platform that generates data open-source models can't access. They use CRISPR to perturb every gene in the genome across millions of cells and measure what happens. That's not something you download from a public database.

Their argument is subtle but important. While RFdiffusion solves structure (what the molecule looks like), proprietary data addresses function (what the molecule does to the cell). Structure is solved. Function is the frontier.

This creates a strategic split in the industry.

Horizontal platforms like Benchling, Certara, and Tamarind Bio sell picks and shovels to everyone. They benefit from commoditization because it increases the volume of assets flowing through their infrastructure. More molecules designed means more molecules needing their services.

Vertical biotechs like Xaira use proprietary data to become fully integrated drug companies. They view AI as an internal engine, not a product to sell.

Both models can win. The question is which layer you're betting on.

The investment barbell: On one end, infrastructure companies selling to everyone (Cellares, Deep 6 AI, Yseop). On the other, data-rich verticals generating proprietary training data at scale (Xaira). The middle is dangerous. Pure-play algorithm companies without massive wet-lab data engines are exposed. Any model advantage gets copied in months.

Fig. 5 WHAT'S NEXT

The Lilly-Benchling partnership goes live later in 2026. Watch for adoption numbers: how many of those 1,300 companies actually use the models, and what they build with them.

Benchling is a likely IPO candidate this year. They hit $6.1 billion valuation in 2021, have 1,300+ customers, and just landed the marquee AI partnership of the year. A successful listing would validate the thesis that infrastructure companies, not discovery platforms, capture value in this era.

Other pharma will respond. Expect similar partnerships from Novartis, Pfizer, or AstraZeneca. If discovery tools are commoditizing, the competitive response is to commoditize faster than your rivals. Open source your models before someone else clones them anyway.

The real test comes when AI-designed molecules hit Phase 2 trials in greater numbers. That's where target selection matters. That's where we find out if the algorithms actually pick winners or just generate candidates faster. So far, Phase 2 failure rates for AI-designed drugs look about the same as the industry average: roughly 40%. AI is compressing timelines and improving chemistry. It hasn't cracked target validation yet.

Benchling valuation (Series F, 2021)

$6.1B

IPO candidate for 2026.

The science of drug discovery is largely solved. What remains is the engineering of drug development: manufacturing at scale, finding patients, getting through regulatory. That's where the value is moving. Lilly sees it. That's why they gave away the models.

If this was useful, forward it to someone building in biotech. More stories like this every week at Thinking Folds.

Keep Reading

No posts found