Function Landscapes and the Limits of AI in Bio
I'm excited about AI as a tool for taking on the hardest problems in biotech. I'm an optimist. But I'm also a biologist. In other words, I know the hardest problems are hard for a reason. Today we're drawing a map of biological complexity and thinking about where AI might get lost.
Transcript
A biological system is more than the sum of its parts. Inside every cell is a complex machine. DNA, RNA, proteins, chemical reactions, transcriptional feedback loops, you name it - all these different biological parts interacting in a very crowded cellular environment.
Engineering biology means making changes to this machine to create a function. These might be small changes like editing individual base pairs in a genome. Or they might be bigger changes like introducing an engineered protein or a new metabolic pathway. To make it work, we have to understand how the changes will interact with this machine.
Now synthetic biologists have many strategies for modeling biological systems. Many of them are very cool, none of them are perfect. It seems clear that AI-based tools are going to add a lot of value here, because we've seen how useful AI can be for finding patterns in big datasets. So I want to offer one analogy for how to think about this problem and what makes it hard.
What I have in mind is a functional landscape. This is adapted from an idea in evolutionary biology called a fitness landscape. In evolution, scientists think about how changes in DNA affect the reproductive fitness of an organism. We can take that idea and generalize it to cover other functions besides reproduction. Producing a valuable biomaterial, treating a rare disease, adding delicious flavor to our cheese - any function we can measure we can map to a functional landscape.
Here's how it works. We're going to build this up in 3 dimensions. Any given point on the landscape corresponds to a particular DNA sequence. The X and Y dimensions of our landscape represent changes to that sequence. Now, we have to think a little abstractly here because, of course, there are many more than 2 kinds of genetic changes we can make. For this to work, we have to assume that the distances on this map correspond to the degree of genetic change. A small move might represent editing a single base pair. A big move might be adding a whole new gene.
The Z dimension on this map represents the measure of the function that we're interested in. If we're designing an industrial enzyme, Z might be the chemical reaction rate. If we're designing a cell therapy, Z could be how well our receptors recognize a tumor.
The reason why functional landscapes are appealing as a visual tool is that they take all the messy complexity of a biological system and reduce it to the simple idea of climbing a mountain. We want to go up - that's all. Somewhere on this landscape is the highest mountain, our job is to keep making changes to the genome until we find that peak.
When the functional landscape is smooth, life is easy. On a smooth landscape, small changes to the genotype result in small, steady changes to the function. Life is even easier when the landscape is monotonic, meaning that there is exactly one peak and that all the slopes point in that direction.
But functional landscapes can also be incredibly rugged. On a rugged landscape, small changes to a DNA sequence cause function to zig-zag up and down. Each step is different from the one before and the one after, so it’s hard to know if you’re making progress.
In the early days of genetic engineering, usually only small changes to a DNA sequence were possible. So they had to climb functional landscapes one step at a time. After each change, they would measure the function to confirm that they were still going uphill. The limitations of this strategy, I think, are pretty clear. It takes forever because you have to pause after every step to measure what you made. You can only explore locally, never knowing if there might be a way taller mountain at a different part of the landscape.
A better way is to sample the landscape and generate a map. Imagine laying out a grid of points, each one corresponding to a different DNA design. We can use computational modeling to extrapolate what is happening in between the points that we measure. Then, if the models are correct, we can predict where the highest point might be. With modern DNA synthesis and gene editing, we can jump right there without having to take all the steps in between.
AI is not the first tool to address this mapping problem. Computational biologists have been building predictive models of biological systems for decades, with mixed results. But you can imagine how AI is going to be particularly good at this. I know I'm not the only one to be impressed by the way generative AI can fill in the gaps from partial data. But how much data? What kind of data? How can we generate the best maps from the fewest sampled points?
The analogy of a functional landscape doesn't tell us which biological problems AI is going to solve or how quickly. But it does frame the challenge. My opinion? There are going to be large parts of the landscape that are pretty smooth and predictable, where AI can guide us a long way with little data. There are going to be other places that are crazy rugged, where AI basically doesn't help at all and we need to perform a high density of real experimental measurements.
Using AI effectively means being able to roam the wide open spaces of biology and also dig in to the little nooks and crannies.