Computational protein design (CPD) uses algorithms to search the vast space of possible amino acid sequences for those that will fold into desired structures and perform specified functions. The field was pioneered using physics-based energy functions, most notably the Rosetta software suite developed by David Baker's group, which models the atomic-level interactions that determine protein stability and conformation. Rosetta has been used to design thousands of novel proteins, from simple monomeric structures to complex multi-component assemblies and catalytic enzymes.
The advent of deep learning has introduced a complementary paradigm to physics-based design. AI-driven approaches from companies like EvolutionaryScale, Generate Biomedicines, and Cradle learn sequence-structure-function mappings from evolutionary data rather than explicit physical models. These data-driven methods can capture subtle patterns that are difficult to model with physics alone, including long-range epistatic interactions and the relationship between sequence variation and functional diversity. Tools like ProteinMPNN, developed by the Baker lab, use graph neural networks to design sequences for specified backbone structures with success rates that dramatically exceed earlier physics-based methods.
The integration of computational protein design into industrial workflows is transforming multiple sectors. In pharmaceuticals, companies use CPD to optimize antibody affinity, engineer bispecific formats, and design novel protein scaffolds. In industrial biotechnology, CPD accelerates enzyme engineering by predicting stabilizing mutations and redesigning active sites for new substrates. The increasing accessibility of CPD tools through cloud platforms and open-source software is democratizing protein engineering, enabling smaller companies and academic laboratories to undertake design projects that previously required specialized computational infrastructure and expertise.