Bringing AI-driven protein-design tools to biologists everywhere

Artificial intelligence is already proving it can accelerate drug development and improve our understanding of disease. But to turn AI into novel treatments we need to get the latest, most powerful models into the hands of scientists.

The problem is that most scientists aren’t machine-learning experts. Now the company OpenProtein.AI is helping scientists stay on the cutting edge of AI with a no-code platform that gives them access to powerful foundation models and a suite of tools for designing proteins, predicting protein structure and function, and training models.

The company, founded by Tristan Bepler PhD ’20 and former MIT associate professor Tim Lu PhD ’07, is already equipping researchers in pharmaceutical and biotech companies of all sizes with its tools, including internally developed foundation models for protein engineering. OpenProtein.AI also offers its platform to scientists in academia for free.

“It’s a really exciting time right now because these models can not only make protein engineering more efficient — which shortens development cycles for therapeutics and industrial uses — they can also enhance our ability to design new proteins with specific traits,” Bepler says. “We’re also thinking about applying these approaches to non-protein modalities. The big picture is we’re creating a language for describing biological systems.”

Advancing biology with AI

Bepler came to MIT in 2014 as part of the Computational and Systems Biology PhD Program, studying under Bonnie Berger, MIT’s Simons Professor of Applied Mathematics. It was there that he realized how little we understand about the molecules that make up the building blocks of biology.

“We hadn’t characterized biomolecules and proteins well enough to create good predictive models of what, say, a whole genome circuit will do, or how a protein interaction network will behave,” Bepler recalls. “It got me interested in understanding proteins at a more fine-grained level.”

Bepler began exploring ways to predict the chains of amino acids that make up proteins by analyzing evolutionary data. This was before Google released AlphaFold, a powerful prediction model for protein structure. The work led to one of the first generative AI models for understanding and designing proteins — what the team calls a protein language model.

“I was really excited about the classical framework of proteins and the relationships between their sequence, structure, and function. We don’t understand those links well,” Bepler says. “So how could we use these foundation models to skip the ‘structure’ component and go straight from sequence to function?”

After earning his PhD in 2020, Bepler entered Lu’s lab in MIT’s Department of Biological Engineering as a postdoc.

“This was around the time when the idea of integrating AI with biology was starting to pick up,” Lu recalls. “Tristan helped us build better computational models for biologic design. We also realized there’s a disconnect between the most cutting-edge tools available and the biologists, who would love to use these things but don’t know how to code. OpenProtein came from the idea of broadening access to these tools.”

Bepler had worked at the forefront of AI as part of his PhD. He knew the technology could help scientists accelerate their work.

“We started with the idea to build a general-purpose platform for doing machine learning-in-the-loop protein engineering,” Bepler says. “We wanted to build something that was user friendly because machine-learning ideas are kind of esoteric. They require implementation, GPUs, fine-tuning, designing libraries of sequences. Especially at that time, it was a lot for biologists to learn.”

OpenProtein’s platform, in contrast, features an intuitive web interface for biologists to upload data and conduct protein engineering work with machine learning. It features a range of open-source models, including PoET, OpenProtein’s flagship protein language model.

PoET, short for Protein Evolutionary Transformer, was trained on protein groups to generate sets of related proteins. Bepler and his collaborators showed it could generalize about evolutionary constraints on proteins and incorporate new information on protein sequences without retraining, allowing other researchers to add experimental data to improve the model.

“Researchers can use their own data to train models and optimize protein sequences, and then they can use our other tools to analyze those proteins,” Bepler says. “People are generating libraries of protein sequences in silico [on computers] and then running them through predictive models to get validation and structural predictors. It’s basically a no-code front-end, but we also have APIs for people who want to access it with code.”

The models help researchers design proteins faster, then decide which ones are promising enough for further lab testing. Researchers can also input proteins of interest, and the models can generate new ones with similar properties.

Since its founding, OpenProtein’s team has continued to add tools to its platform for researchers regardless of their lab size or resources.

“We’ve tried really hard to make the platform an open-ended toolbox,” Bepler says. “It has specific workflows, but it’s not tied specifically to one protein function or class of proteins. One of the great things about these models is they are very good at understanding proteins broadly. They learn about the whole space of possible proteins.”

Enabling the next generation of therapies

The large pharmaceutical company Boehringer Ingelheim began using OpenProtein’s platform in early 2025. Recently, the companies announced an expanded collaboration that will see OpenProtein’s platform and models embedded into Boehringer Ingelheim’s work as it engineers proteins to treat diseases like cancer and autoimmune or inflammatory conditions.

Last year, OpenProtein also released a new version of its protein language model, PoET-2, that outperforms much larger models while using a small fraction of the computing resources and experimental data.

“We really want to solve the question of how we describe proteins,” Bepler says. “What’s the meaningful, domain-specific language of protein constraints we use as we generate them? How can we bring in more evolutionary constraints? How can we describe an enzymatic reaction a protein carries out such that a model can generate sequences to do that reaction?”

Moving forward, the founders are hoping to make models that factor in the changing, interconnected nature of protein function.

“The area I am excited about is going beyond protein binding events to use these models to predict and design dynamic features, where the protein has to engage two, three, or four biological mechanisms at the same time, or change its function after binding,” says Lu, who currently serves in an advisory role for the company.

As progress in AI races forward, OpenProtein continues to see its mission as giving scientists the best tools to develop new treatments faster.

“As work gets more complex, with approaches incorporating things like protein logic and dynamic therapies, the existing experimental toolsets become limiting,” Lu says. “It’s really important to create open ecosystems around AI and biology. There’s a risk that AI resources could get so concentrated that the average researcher can’t use them. Open access is super important for the scientific field to make progress.”