TY - JOUR
T1 - Rapid discovery of Transglutaminase 2 inhibitors for celiac disease with boosting ensemble machine learning
AU - Wichka, Ibrahim
AU - Lai, Pin Kuang
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2024/12
Y1 - 2024/12
N2 - Celiac disease poses a significant health challenge for individuals consuming gluten-containing foods. While the availability of gluten-free products has increased, there is still a need for therapeutic treatments. The advancement of computational drug design, particularly using bio-cheminformatics-oriented machine learning, offers promising avenues for developing such therapies. One promising target is Transglutaminase 2 (TG2), a protein involved in the autoimmune response triggered by gluten consumption. In this study, we utilized data from approximately 1100 TG2 inhibition assays to develop ligand-based molecular screening techniques using ensemble machine-learning models and extensive molecular feature libraries. Various classifiers, including tree-based methods, artificial neural networks, and graph neural networks, were evaluated to identify primary systems for predictive analysis and feature significance assessment. Boosting ensembles of perceptron deep learning and low-depth random forest weak learners emerged as the most effective, achieving over 90 % accuracy, significantly outperforming a baseline of 64 %. Key features, such as the presence of a terminal Michael acceptor group and a sulfonamide group, were identified as important for activity. Additionally, a regression model was created to rank active compounds. We developed a web application, Celiac Informatics (https://celiac-informatics-v1–2b0a85e75868.herokuapp.com), to facilitate the screening of potential therapeutic molecules for celiac disease. The web app also provides drug-likeness reports, supporting the development of novel drugs.
AB - Celiac disease poses a significant health challenge for individuals consuming gluten-containing foods. While the availability of gluten-free products has increased, there is still a need for therapeutic treatments. The advancement of computational drug design, particularly using bio-cheminformatics-oriented machine learning, offers promising avenues for developing such therapies. One promising target is Transglutaminase 2 (TG2), a protein involved in the autoimmune response triggered by gluten consumption. In this study, we utilized data from approximately 1100 TG2 inhibition assays to develop ligand-based molecular screening techniques using ensemble machine-learning models and extensive molecular feature libraries. Various classifiers, including tree-based methods, artificial neural networks, and graph neural networks, were evaluated to identify primary systems for predictive analysis and feature significance assessment. Boosting ensembles of perceptron deep learning and low-depth random forest weak learners emerged as the most effective, achieving over 90 % accuracy, significantly outperforming a baseline of 64 %. Key features, such as the presence of a terminal Michael acceptor group and a sulfonamide group, were identified as important for activity. Additionally, a regression model was created to rank active compounds. We developed a web application, Celiac Informatics (https://celiac-informatics-v1–2b0a85e75868.herokuapp.com), to facilitate the screening of potential therapeutic molecules for celiac disease. The web app also provides drug-likeness reports, supporting the development of novel drugs.
KW - Celiac disease
KW - Computational drug discovery
KW - Ensemble machine learning
KW - Inhibitor screening
KW - Quantitative structure-activity relationship (QSAR)
KW - Transglutaminase 2
UR - http://www.scopus.com/inward/record.url?scp=85206935990&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85206935990&partnerID=8YFLogxK
U2 - 10.1016/j.csbj.2024.10.019
DO - 10.1016/j.csbj.2024.10.019
M3 - Article
AN - SCOPUS:85206935990
VL - 23
SP - 3669
EP - 3679
JO - Computational and Structural Biotechnology Journal
JF - Computational and Structural Biotechnology Journal
ER -