TY - JOUR
T1 - Machine Learning Feature Selection for Predicting High Concentration Therapeutic Antibody Aggregation
AU - Lai, Pin Kuang
AU - Fernando, Amendra
AU - Cloutier, Theresa K.
AU - Kingsbury, Jonathan S.
AU - Gokarn, Yatin
AU - Halloran, Kevin T.
AU - Calero-Rubio, Cesar
AU - Trout, Bernhardt L.
N1 - Publisher Copyright:
© 2020 American Pharmacists Association®
PY - 2021/4
Y1 - 2021/4
N2 - Protein aggregation can hinder the development, safety and efficacy of therapeutic antibody-based drugs. Developing a predictive model that evaluates aggregation behaviors during early stage development is therefore desirable. Machine learning is a widely used tool to train models that predict data with different attributes. However, most machine learning techniques require more data than is typically available in antibody development. In this work, we describe a rational feature selection framework to develop accurate models with a small number of features. We applied this framework to predict aggregation behaviors of 21 approved monospecific monoclonal antibodies at high concentration (150 mg/mL), yielding a correlation coefficient of 0.71 on validation tests with only two features using a linear model. The nearest neighbors and support vector regression models further improved the performance, which have correlation coefficients of 0.86 and 0.80, respectively. This framework can be extended to train other models that predict different physical properties.
AB - Protein aggregation can hinder the development, safety and efficacy of therapeutic antibody-based drugs. Developing a predictive model that evaluates aggregation behaviors during early stage development is therefore desirable. Machine learning is a widely used tool to train models that predict data with different attributes. However, most machine learning techniques require more data than is typically available in antibody development. In this work, we describe a rational feature selection framework to develop accurate models with a small number of features. We applied this framework to predict aggregation behaviors of 21 approved monospecific monoclonal antibodies at high concentration (150 mg/mL), yielding a correlation coefficient of 0.71 on validation tests with only two features using a linear model. The nearest neighbors and support vector regression models further improved the performance, which have correlation coefficients of 0.86 and 0.80, respectively. This framework can be extended to train other models that predict different physical properties.
KW - Antibody aggregations
KW - Feature selections
KW - Machine learning
KW - Molecular dynamics simulations
UR - http://www.scopus.com/inward/record.url?scp=85098629571&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098629571&partnerID=8YFLogxK
U2 - 10.1016/j.xphs.2020.12.014
DO - 10.1016/j.xphs.2020.12.014
M3 - Article
C2 - 33346034
AN - SCOPUS:85098629571
SN - 0022-3549
VL - 110
SP - 1583
EP - 1591
JO - Journal of Pharmaceutical Sciences
JF - Journal of Pharmaceutical Sciences
IS - 4
ER -