Cambridge Healthtech Institute recently interviewed Dr. Gabriel Rocklin, Senior Fellow, Biochemistry & Bioengineering, University of Washington prior to his presentation on “High-Throughput de novo Computational Protein Design and Its Applications” at the Seventh Annual Higher-Throughput Protein Production and Characterization conference (January 11-12, 2018 during the 17th Annual PepTalk event in San Diego, CA).

High-Throughput de novo Computational Protein Design and Its Applications

Advances in computational protein design and DNA synthesis technology have made it possible to design and recombinantly express tens of thousands of small designer proteins (40-80 amino acids) at once, each with a novel de novo fold and unique functional possibilities. These proteins can be engineered to bind to targets and to resist thermal denaturation and aggregation, and we can iteratively improve these properties through cycles of large-scale design and efficient, massively parallel experimental testing.

How do you build de novo folds into proteins?

We start from what we call a "blueprint", which is a kind of abstract schematic for a new protein. The blueprint describes which secondary structure elements will be present, their lengths, how they are ordered in the protein chain, the lengths of the loops that connect them, and the spatial arrangement of beta strands for form beta sheets. From these blueprints, we use our Rosetta software to create a 3D model of the backbone of the protein we want to build, and then search sequence space to find the sequence that would be most optimal for that 3D model.

How is it possible to engineer proteins that resist thermal denaturation and aggregation?

We have empirically found that de novo designed proteins are much more stable than naturally evolved proteins, which suggests that there is a wide range of protein properties that nature has not yet optimized. The ability to test thousands of designs makes it possible to identify structural properties that correlate with protein phenotypes such as folding stability and aggregation. Once we understand these structure-phenotype linkages, we adapt our design process to create structures with the properties we want.

What is the benefit of recombinantly expressing tens of thousands of small designer proteins at once? How does your group handle all that data?

Proteins are such complex beasts that you really want a lot of examples when trying to examine how different features of a protein's structure lead to an overall phenotype such as stability. Even a single mutation in a protein changes many things at once: it can change the amount of hydrophobicity, the propensity of that local region to form a particular secondary structure, the packing density, the accessible conformations of neighboring residues, and so forth. The more data we have, the easier it is to separate signal from noise. We use a variety of tools to analyze our data, from simple regression models all the way up to a custom procedure for a fitting the entire Rosetta energy function.