Cambridge Healthtech Institute's Inaugural

ML and Predictive Methods in Analytical Development

Addressing Development Challenges Before They Arise

January 13, 2025 ALL TIMES PST

The growth of biotherapeutics to include a vast spectrum of formats and modalities in a wider variety of indications has sharpened focus on the biophysical challenges that must be addressed before the potential of these molecules can be realized in the clinic. Rapid advancement of computing capability and algorithmic tools in recent years offers an exciting route towards surmounting these challenges. In the ML and Predictive Methods in Analytical Development Symposium, we will explore new technologies and strategies for circumventing issues with aggregation, stability, biotransformation and other key biophysical properties in silico. To this end, we will investigate the collection, organization and integration of diverse data types to create predictive models, understand the role of structure and simulation guided predictive modeling in the space, and gain an understanding of what outstanding gaps remain in the field.

Monday, January 13

8:00 amRegistration and Morning Coffee

8:50 amOrganizer's Welcome Remarks

Govinda Sharma, PhD, Conference Producer, Cambridge Healthtech Institute

DATA-DRIVEN PREDICTION IN ANALYTICAL DEVELOPMENT

8:55 am

Chairperson's Remarks

Kyle A. Barlow, PhD, Senior Scientist, Computational Biology, Adimab LLC

9:00 am

High-Throughput Developability Strategies to Support the Modern Pipeline and ML Models

Gilad Kaplan, PhD, Director, Biologics Engineering, AstraZeneca

Early developability screens are used to predict the downstream biophysical characteristics and manufacturability of candidate drug biologics. To realize the full potential of early developability screens, a fully automatable, predictive, and high-throughput developability screen is needed. We present our data-driven approach to increasing the throughput of the early developability phase to accommodate a growing pipeline and generate the data needed to construct in silico developability prediction models.

9:30 am

The Determinants of Aggregation in Small Protein Domains

Cydney M. Martell, PhD Candidate, Pharmacology, Northwestern University

Predicting protein aggregation remains difficult, limiting their use for biotechnology and therapeutic applications. We aim to design aggregation-resistance by collecting and learning from large, experimentally validated datasets. I quantified aggregation after thermal and pH stress for thousands of small protein domains using mass spectrometry. I’m developing machine learning models to predict aggregation from protein features. Through iterative experiments and design, I will refine my model to achieve unprecedented aggregation-resistance. 

10:00 am

Developability Assessment in Early Therapeutic Antibody Discovery by Integrating Machine Learning and High-Throughput Bioanalytical Assays

Dalton Markrush, Scientist, Global Bioanalytics, Alloy Therapeutics

Selection of highly developable leads is crucial for clinical translation and requires accurate developability assessments benchmarked against the clinical landscape. Combining large in vitro datasets with in silico tools, we have developed integrated wet lab and dry lab workflows that enable rational selection of both assays and candidates. The resulting developability pipeline enables efficient identification of highly developable leads with consideration of specific downstream risks.


10:30 am Developability Profiles of Antibodies Discovered from Specifica’s Generation 3 VHH, Fab, and scFv Libraries

Frank Erasmus, Director, Bioinformatics, Specifica, an IQVIA business

This presentation provides an overview of the developability profiles of antibodies discovered from Specifica’s Generation 3 VHH, Fab, and scFv libraries, selected using phage and yeast display. Antibodies are assessed as VHH/VHH-Fc or IgG, with many exhibiting strong affinities and specificities to their targets. Data show that a large panel of leads perform comparably to the late-stage or clinically approved antibodies from which they are derived in terms of developability. Metrics such as self-interaction, aggregation, polyspecificity/polyreactivity, hydrophobicity, and thermal stability support the conclusion that drug-like antibodies can be selected directly without requiring further optimization.

11:00 amNetworking Coffee Break

11:15 am

A Step towards Understanding and Predicting High-Concentration Antibody Liabilities

Jonathan Zarzar, Senior Principal Scientist and Group Leader, Pharmaceutical Development, Genentech, Inc.

The increase in biologics administered subcutaneously has required higher protein concentrations and highlighted liabilities such as protein aggregation, precipitation, and high viscosity. Identifying optimal high-concentration formulations that limit these liabilities can be slow and costly, and often prevent therapeutics from moving rapidly into the clinic/market. Here, we present advances that have been made in understanding high-concentration protein behavior as well as interesting case studies.

11:45 am

Machine Learning Methods for Integrated Developability Predictions in Early-Stage Antibody Discovery

Kyle A. Barlow, PhD, Senior Scientist, Computational Biology, Adimab LLC

Initial antibody discovery generates molecules with a wide range of biophysical characteristics that can be used to predict developability, presenting an opportunity to filter or improve their properties. We present machine learning models for developability predictions of properties such as hydrophobicity, chemical stability, and viscosity, and explain how they are deployed to obtain actionable information. We describe the generation and benchmarking of the models and associated experimental input training data.

12:15 pm LUNCHEON PRESENTATION: Unlocking Antibody Developability Predictions with High-Throughput Data Generation and AI Modeling

Seth Ritter, Head of AI Business Unit, Ginkgo AI, Ginkgo Bioworks

Antibody developability is critical for therapeutic discovery, requiring high-quality data and advanced analytics to optimize candidates. At Ginkgo Bioworks, we’ve scaled developability workflows to high-throughput, generating robust datasets for biophysical and functional characterization. Our AI team leverages over a decade of expertise to build predictive models for antibody developability, offering end-to-end or modular solutions. Join us to learn how Ginkgo is participating in the revolution in biologics with flexible, innovative approaches.

12:45 pmSession Break

SIMULATION AND STRUCTURE-BASED APPROACHES FOR BIOTHERAPEUTIC DEVELOPMENT

2:00 pm

Chairperson's Remarks

Salvador Ventura, PhD, Full Professor, Biochemistry and Molecular Biology, Autonomous University of Barcelona

2:05 pm

Aggrescan4D: Structure-Informed Analysis of pH-Dependent Protein Aggregation

Salvador Ventura, PhD, Full Professor, Biochemistry and Molecular Biology, Autonomous University of Barcelona

Protein aggregation impacts industrial protein production and formulation. Aggrescan3D (A3D) was developed to aid in understanding and engineering aggregation in globular proteins. It has become one of the most popular structure-based predictors for aggregation studies and protein redesign. Here, we present Aggrescan4D (A4D), which largely extends A3D’s functionality by incorporating pH-dependent aggregation prediction and an evolutionarily informed automatic mutation protocol to engineer protein solubility.

2:35 pm

Computational Design of Membrane Protein Stability, Recognition, and de novo TM Regulatory Adaptors

Marco Mravic, PhD, Assistant Professor, Department of Integrative Structural and Computational Biology, Scripps Research Institute

Because membrane proteins can be structurally dynamic, they are often unstable or difficult to discern mechanism of their many conformations. The chemical intuition and tools for engineering/design of protein in lipid are still far from the advanced capabilities for water-soluble proteins. Our group uses de novo design of simple transmembrane (TM) proteins for rapid Build-Test-Learn cycles to develop software that reliably encodes protein stability and molecular recognitions for membrane protein engineering. A generative sequence design method focused on TM interactions in lipid was tested by design and in vitro folding of >20 synthetic TM protein assemblies, where most correctly folded. A few reached hyper-stability, folded in high SDS, temperature, etc.—providing feedback on what molecular features/patterns idealized sequence-structure principles between TM spans. With improved principles, we devised an in silico approach to design de novo TM proteins that bind and recognize target membrane proteins directly by their TM spans to modulate structure and function. Past work proved we can recognize and functionally regulate single-pass proteins by their TM spans, integrins and EPO cytokine receptor. We recently advance the technology to correct misfolding of an ion channel and target GPCRs. These tools and chemical “rules” have advanced targeting and stabilization tools for membrane proteins.

CURATED DATA AND DATABASES IN PREDICTING DEVELOPABILITY

3:05 pm

Chairperson's Remarks

Salvador Ventura, PhD, Full Professor, Biochemistry and Molecular Biology, Autonomous University of Barcelona

3:10 pm

Development of Clinically Relevant Specifications for Biologics

Siddarth Prabhu, Process Development Scientist, Attribute Sciences, Amgen

Biological relevance tools are essential to gain knowledge about attribute impact which can be used to inform clinically relevant specifications.  A new data science method called the Clinical Impact of Attributes (CIA) approach will be shared that uses clinical trial information to justify clinically safe specifications.  CIA analyzes clinical studies to determine if any correlations exists between attribute levels exposed in patients and the development of adverse events.  Several case studies will be shown.

3:40 pmNetworking Refreshment Break

APP WORKSHOP MEET-UP

3:45 pm

App Workshop- Successful Tips for Navigating PepTalk App for your Onsite Experience

Kevin Brawley, Project Manager, Production Operations & Communications, Cambridge Innovation Institute

Julie Sullivan, Production, Cambridge Innovation Institute

Looking to maximize your onsite experience? Want to connect with fellow attendees? Need help viewing the app? Come join us for the App Workshop! We will have tips to navigating the app to maximize your onsite experience.

4:00 pm

Leveraging a Database of Therapeutic Antibodies to Design Novel Therapeutics with De-Risked Developability Profiles

Oliver Turnbull, PhD Candidate, Department of Statistics, University of Oxford

Approved therapeutic antibodies provide valuable insights into which biophysical properties can be considered safe from a developability perspective, aiding the design of biotherapeutics with de-risked developability profiles. I will present our work on building a database of therapeutic antibodies (TheraSAbDab), using this to develop a predictive tool for developability risk (Therapeutic Antibody Profiler 2), and finally our generative machine learning model (p-IgGen) for creating developability-conditioned in silico screening libraries.

4:30 pm

It’s Going to Take a Village: Standardizing Analytics for Better Machine Learning

Michael S. Marlow, PhD, Director, Biologics CMC Research, Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals, Inc.

Effective machine learning (ML) relies on high-quality data and standardized analysis procedures. We will explore the critical need for a collaborative, community-driven approach to standardizing ML analytics and contemplate strategies for producing better data. By establishing best practices and shared resources, development groups across the industry will be empowered to efficiently integrate different data types and leverage the full toolbox of ML techniques, ensuring reproducibility, interpretability, and robust model performance.

5:00 pm

Checking Your Peptides in Databases: Complexities and Quirks

Christopher Southan, PhD, Honorary Professor, Deanery of Biomedical Sciences, University of Edinburgh

Public databases of sequences and bioactivity data, including from patent extractions, are a crucial but overlooked resource for peptide researchers. Because natural endogenous or designed therapeutic peptides fall between the formal representations of small-molecule cheminformatics and protein-sequence bioinformatics, they have database representational challenges that make them difficult to find. This presentation will review various sources of peptide entries in PubChem and offer searching tips.

5:30 pmClose of Symposium