Kits, Cores, Services & Progress in Biotech Workflows
Today we're talking about what drives change in biotech R&D and what might be coming next.
Transcript
The next time you're at a biotech event and you need to kill a few hours, here's what you do. You go round up all the old timers and get them talking about how things used to be.
How did you do a DNA extraction? Ah! We used to do that with Phenol-Chloroform.
What about DNA sequencing? Let me tell you about polyacrylamide gels.
PCR primers? Those things used to cost $100 each.
Lately, I've been paying attention to these stories because I'm curious about the history of change in the way we do biotech. Why do some methods look radically different from 20 years ago and some not at all? Is there a pattern to these changes - something that could help me see what's coming?
One pretty clear trend is kitification - the use of commercial reagent kits. A kit collects together all of the common materials and chemicals for a particular workflow and puts them in a convenient box along with a standard protocol for how to use them. Popular examples are the kits for DNA extraction, for PCR reagents, or for detecting proteins with ELISA.
Kits tend to emerge for protocols that are versatile - when you have a set of core components that enable many different kinds of experiments. They're simple to use, relatively cheap, and low-throughput, meaning they’re mostly for stuff you can do by hand.
At the other end of the cost curve, we see core facilityization. Core facilityization. The construction of core facilities. Basically the transfer of certain lab operations to a different, purpose-built lab. For example if you've got a simple optical microscope, you probably just keep that on a bench. But the big scopes, the confocal, the AFM, those get put in a different building run by experts who know how to use them. Other common core facilities run the flow cytometers, mass spectrometers or animal experiments.
Core facilities make sense for workflows that are particularly specialized, expensive, and powerful enough to justify those costs. Unlike using a kit, working with a core facility is often a major time investment. It requires a collaboration between experts to ensure the fancy equipment is used in a way that aligns with the experimental goals.
Finally, in the last 30 years we’ve seen certain key operations in biotech R&D move to a service model. This is most obvious around DNA sequencing and DNA synthesis. It's hard to believe there was a time when labs synthesized their own primers. And then there was a time when R&D groups had DNA synthesis core facilities. It's unthinkable now. Everyone buys primers from a commercial service and they love it.
There are other biotech services available, but I think the popularity of DNA synthesis as a service gives us a clue about what drives this kind of progress: flexibility and scale. Flexibility because DNA is one molecule with infinite sequence variations and endless applications. Scale because when you need DNA, you usually need a lot of it - many different sequences. But you don't want to invest in the infrastructure to make it yourself, because DNA chemistry is not your core specialty. So you order some DNA and you get on with doing your science.
So what's next? What's the thing that, 10 years from now, the old timers will be saying "Wow can you believe we used to have to do that in house?" Well, I think that a similar need for flexibility and scale is coming to a lot of lab data. In the AI era, many common types of lab data want to run on a service model.
AI can never get enough data. A kit that you buy and use by hand is not going to satisfy it. AI requires many different kinds of data. No single core facility can provide them all. The long historical trends to bigger data and specialized equipment are only accelerating. 96 well plates became 384. CRISPR knockouts became pooled CRISPR screens. RNAseq became single-cell RNAseq. Cell painting became high-throughput, high-content imaging.
A multimodal, large scale dataset is going to be a core part of most R&D projects in the near future, just like DNA synthesis is today. It won't be the whole project - it will be one resource that your biology lab calls on to get your science done. It will be routine. Boring, even.
There was a time when doing an experiment with a single synthetic DNA molecule got you a high impact paper. Then it required tens or hundreds. Longer sequences, larger libraries, until eventually the new capability was just a part of everyone's process. But every time the technology advanced, the teams that were the fastest to adapt to the new way of working had an edge.
Biological data for AI will look like that. The whole ecosystem is so hungry for data right now. The teams that get good at working with Lab Data as a Service, for example the data that we sell at the Ginkgo foundry, will have the edge.
Someday we'll look back at this era and we'll be the old timers. Some young kid will round us up and say "tell me what it was like back when you had to generate datasets for AI by hand?" And we'll say "sit down whippersnapper and let me tell you about the 2020s.”