Proteins are the molecules that get the work done in nature, and an entire industry has sprung up around successfully modifying and manufacturing them for various purposes. But this is time consuming and haphazard; Cradle wants to change that with an AI-powered tool that tells scientists what new structures and sequences make a protein do what they want. The company emerged from stealth today with a substantial seed round.
AI and proteins have been in the news lately, but largely thanks to the efforts of research teams like DeepMind and Baker Lab. Their machine learning models take in easily collected RNA sequence data and predict the structure a protein will take — a step that used to take weeks and expensive special equipment.
But as incredible as that possibility is in some domains, it’s just the starting point for others. Modifying a protein to become more stable or to bind to a particular other molecule involves much more than just understanding its general shape and size.
“If you’re a protein engineer and you want to design a certain property or function in a protein, it doesn’t help to just know what it looks like. If you have a picture of a bridge, you can’t tell if it’s going to fall down or not’, explains Cradle CEO and co-founder Stef van Grieken.
“Alphafold takes a sequence and predicts what the protein will look like,” he continued. “We are the generative brother of that: you choose the properties you want to engineer, and the model generates sequences that you can test in your lab.”
Predict what proteins – especially those new to science – will do on sight is a difficult task for many reasons, but in the context of machine learning, the main problem is that there is not enough data available. So Cradle developed much of its own dataset in a wet lab, testing protein after protein and seeing what changes in their sequences seemed to lead to what effects.
Interestingly enough, the model itself is not biotech-specific, but a derivative of the same “great language models” that have produced text production engines such as GPT-3. Van Grieken noted that these models are not strictly limited to language in how they understand and predict data, an interesting “generalization” feature that researchers are still exploring.
Of course, the protein sequences that Cradle records and predicts aren’t in any language we’re familiar with, but they’re relatively simple linear strings of text with associated meanings. “It’s like an alien programming language,” van Greeks said.
Of course, protein engineers are not helpless, but their work necessarily involves a lot of guesswork. One can be certain that of the 100 sequences they modify is the combination that will produce
The model works in three base layers, he explained. First, it is judged whether a particular sequence is ‘natural’, ie whether it is a meaningful sequence of amino acids or just random ones. This is similar to a language model that can say with 99 percent certainty that a sentence is in English (or Swedish, in Van Grieken’s example), and that the words are in the correct order. It knows this by “reading” millions of such sequences determined by laboratory analysis.
Subsequently, the actual or potential meaning in the foreign language of the protein is examined. “Imagine we give you a sequence, and this is the temperature at which this sequence falls apart,” he said. “If you do that for many sequences, you can say not just ‘this looks natural’, but ‘this looks like 26 degrees Celsius’. That helps the model figure out which parts of the protein to target. ”
The model can then suggest ranges to fit – essentially educated guesses, but a stronger starting point than zero. And the engineer or lab can then try them out and bring that data back to the Cradle platform, where it can be recaptured and used to tailor the model to the situation.
Modifying proteins for various purposes is useful in biotechnology, from drug design to biomanufacturing, and the journey from vanilla molecule to tailored, effective and efficient molecule can be long and expensive. Any way to shorten it is likely to be welcomed by, at least, the lab technicians who have to run hundreds of experiments to get just one good result.
Cradle operated in stealth and is now on the rise after raising $5.5 million in a seed round led by Index Ventures and Kindred Capital, with the participation of angels John Zimmer, Feike Sijbesma and Emily Leproust.
Van Grieken said the funding would allow the team to scale up data collection — the more the merrier when it comes to machine learning — and work on the product to make it “more self-serve.”
“Our goal is to reduce the cost and time to market a biobased product by an order of magnitude,” van Grieken said in the press release, “so that everyone – even ‘two kids in their garage’ – can bring a biobased product to the market.”