TY - JOUR
T1 - Enhancing ground classification models for TBM tunneling
T2 - Detecting label errors in datasets
AU - Mostafa, Saadeldin
AU - Sousa, Rita L.
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/6
Y1 - 2024/6
N2 - Tunnel Boring Machine (TBM) construction, particularly with closed-face TBMs, faces uncertainties due to the inability of the operator to directly observe the ground ahead. These uncertainties can lead to time delays, cost overruns, and accidents. While supervised machine learning techniques have been used to predict geology from TBM sensor data, their performance drops significantly when applied to other projects, indicating poor generalization. To ensure accurate results and improved generalization to future data, supervised learning models require high-quality, well-labeled data which is not usually the case for TBM datasets. This paper addresses the issue of “noisy” labels in TBM datasets, which human operators and engineers often label with varying interpretations. A data-centric framework was adapted and applied to an Earth Pressure Balance Machines (EPBM) tunnel dataset to detect and identify these mislabeled datapoints. The framework's outputs were validated using two techniques and apply several methods to clean the dataset. The best-performing method was selected for the test set. The paper concludes by discussing the limitations of the proposed method, the challenges encountered, and future research directions in this area.
AB - Tunnel Boring Machine (TBM) construction, particularly with closed-face TBMs, faces uncertainties due to the inability of the operator to directly observe the ground ahead. These uncertainties can lead to time delays, cost overruns, and accidents. While supervised machine learning techniques have been used to predict geology from TBM sensor data, their performance drops significantly when applied to other projects, indicating poor generalization. To ensure accurate results and improved generalization to future data, supervised learning models require high-quality, well-labeled data which is not usually the case for TBM datasets. This paper addresses the issue of “noisy” labels in TBM datasets, which human operators and engineers often label with varying interpretations. A data-centric framework was adapted and applied to an Earth Pressure Balance Machines (EPBM) tunnel dataset to detect and identify these mislabeled datapoints. The framework's outputs were validated using two techniques and apply several methods to clean the dataset. The best-performing method was selected for the test set. The paper concludes by discussing the limitations of the proposed method, the challenges encountered, and future research directions in this area.
KW - Data-centric geotechnics
KW - Database
KW - EPBM tunnel
KW - Machine learning
KW - Noisy labels
UR - http://www.scopus.com/inward/record.url?scp=85189495985&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189495985&partnerID=8YFLogxK
U2 - 10.1016/j.compgeo.2024.106301
DO - 10.1016/j.compgeo.2024.106301
M3 - Article
AN - SCOPUS:85189495985
SN - 0266-352X
VL - 170
JO - Computers and Geotechnics
JF - Computers and Geotechnics
M1 - 106301
ER -