Cambridge Healthtech Institute recently interviewed Dr. Peter Nollert, Business Director, Research and Development, Bio Data Bridges.

Prior to his presentation on “Establishing Protein Production without Reinventing the Wheel: ProteinData.Cloud” at the Tenth Annual Engineering Genes and Hosts conference (January 8-9, 2018 during the 17th Annual PepTalk event in San Diego, CA), Dr. Nollert shared some thoughts on:

Production of New Proteins: Research Challenges and Automation Solutions

What does he consider the biggest challenges researchers face when exploring the production of a new protein?

As he described, “There are really two different types of challenges researchers face when gearing up to produce a ‘new protein’. Is the protein ‘new’ as in ‘there’s no information available at all’? Of the 19,000 or so protein coding genes, there are indeed many proteins that have not been produced. And on top of this, there are myriads of variants that researchers may want to engineer.”

“So, setting up a successful protein production path in the absence of any related information is challenging, since there are many parameters that could, and may need to be, explored. For instance, the type of expression system and cell cultivation conditions, the sequence of the recombinant expression construct and the use of purification tags, the actual chromatographic purification schema and buffer compatibility. The numbers of parameters to select are huge and typically require a sequential scouting approach with uncertain outcome.”

Dr. Nollert continued, “The situation is a little different for cases where the desired protein is only ‘new to the researcher’, implying that someone else may have explored, and in the best case successfully produced, the protein of interest, and provided a description in a report, for instance, a peer-reviewed publication. In such a case the challenge shifts to reproducing the production of a protein sample with the resources available to the researcher. Even with literature precedence, adjustments to reported production conditions are typically necessary to reproduce a protein sample.”

That said, “In both cases…researchers face uncertainties with respect to time and resource requirements, as well as estimating yield and final sample characteristics. The scientific reason for these challenges is that every protein is different and many need a bespoke treatment regime. That’s why the production of a new protein sample is a research project rather than an engineering exercise that can be cast in narrow cost and deliverables.”

Dr. Nollert also shared how, with the goal of making recombinant protein production more efficient, he came to focus on software and automation. “As a trained biochemist I’ve seen firsthand how difficult it can be to prepare high-quality protein samples. And I’ve discussed with many fellow protein researchers over the years where the trouble spots are.”

What are those problem areas?

“A key problem they tell me is that most protein production information actually never gets published. For example, negative outcome experiments, prep failures, low expressers, or protein samples that are used as tools or that are deemed to have little value at the time. Very little of this becomes available, although the underlying information is often meticulously recorded in digital format. Some estimate that less than 5% of gene-to-protein production data make it into the scientific literature. And a lot of useful information is locked up somewhere in a spreadsheet or a lab database where nobody can access it when needed.”

Thus, “This is where databases and software come to play. We have constructed an open source repository to deposit and access protein production information. We collect spreadsheets and databases that are in use all over the world but that never see the light of the day. And we make this information available to the community for free, in the form of a protein production search engine, ProteinData.Cloud.”

Looking more broadly at the field in 5-10 years, Dr. Nollert observed, “Right now the protein production field feels a bit like the ‘Wild West’. Any lab scouting for a new protein production path uses their own experimental flavor of how they conduct experiments, how they go about reporting outcomes and what terminology they use to describe experiments.”

“This makes it rather difficult to compare protein expression or protein purification experiments from different sources. I doubt that standards will be embraced by the community to help this issue.”

On the plus side, “I do think that machine learning tools and artificial intelligence algorithms with predictive power will become stronger and help sort through this mishmash of protein production data. Ideally, we’d get to a point where researchers consult first with a smart software / database system to devise new expression constructs, tagging schemes and use these suggestions to more efficiently generate highly engineered protein samples of qualities and quantities they need.”


Peter_NollertPeter Nollert, Ph.D., Business Director, Research and Development, Bio Data Bridges

Recognized expert in structural and membrane protein science. Now on a mission to reduce effort for biomedical researchers to produce recombinant protein samples by using software and automation. Broad professional experience as Scientist, CTO/ VP in drug-discovery; expertise in computer-aided protein engineering applications. Currently Co-Founder and Business Director at Bio Data Bridges, previously VP Membrane Protein Science, Beryllium and Chief Technology Officer, Emerald Bio. Education: Ph.D. in Biochemistry (Univ. Tuebingen, Germany), postdocs at Univ. Basel, Stanford Univ. and UCSF. 5 US patents and more than 30 papers in peer-reviewed journals.


Submit a Speaker Proposal

View By: