TY - JOUR
T1 - Effect of dataset representation bias on generalizability of machine learning models in predicting flexural properties of ultra-high-performance concrete (UHPC) beams
AU - Chen, Jinxin
AU - Bao, Yi
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2025/3/1
Y1 - 2025/3/1
N2 - Machine learning (ML) offers transformative potential in structural design through the high efficiency in exploring optimal solutions within vast design spaces. However, concerns persist among structural engineers regarding the generalizability and reliability of ML models trained on datasets with biases. This study investigates the influence of dataset representation bias on the generalizability of ML models in predicting the flexural properties of ultra-high-performance concrete beams. The research addresses three primary objectives: (1) developing a novel metric to quantify representation bias of continuous datasets, (2) devising an algorithm to create datasets with controlled bias levels, and (3) evaluating the effect of dataset representation bias on the generalizability of ML models. Key contributions include the first comprehensive analysis of representation bias in structural prediction models, the introduction of a Monte Carlo Bias Estimation method for evaluating dataset bias, the development of an Adaptive Bias Sampling Algorithm for dataset generation, and the modification of Latin Hypercube Sampling to ensure uniform dataset distribution. Findings reveal that dataset bias significantly undermines the generalizability of ML models, and the proposed methods offer effective strategies for assessing and mitigating dataset bias, thereby enhancing the generalizability of ML models.
AB - Machine learning (ML) offers transformative potential in structural design through the high efficiency in exploring optimal solutions within vast design spaces. However, concerns persist among structural engineers regarding the generalizability and reliability of ML models trained on datasets with biases. This study investigates the influence of dataset representation bias on the generalizability of ML models in predicting the flexural properties of ultra-high-performance concrete beams. The research addresses three primary objectives: (1) developing a novel metric to quantify representation bias of continuous datasets, (2) devising an algorithm to create datasets with controlled bias levels, and (3) evaluating the effect of dataset representation bias on the generalizability of ML models. Key contributions include the first comprehensive analysis of representation bias in structural prediction models, the introduction of a Monte Carlo Bias Estimation method for evaluating dataset bias, the development of an Adaptive Bias Sampling Algorithm for dataset generation, and the modification of Latin Hypercube Sampling to ensure uniform dataset distribution. Findings reveal that dataset bias significantly undermines the generalizability of ML models, and the proposed methods offer effective strategies for assessing and mitigating dataset bias, thereby enhancing the generalizability of ML models.
KW - Data-driven modeling
KW - Flexural properties
KW - Generalizability
KW - Machine learning (ML)
KW - Representation bias
KW - Structural design
UR - http://www.scopus.com/inward/record.url?scp=85212342523&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85212342523&partnerID=8YFLogxK
U2 - 10.1016/j.engstruct.2024.119508
DO - 10.1016/j.engstruct.2024.119508
M3 - Article
AN - SCOPUS:85212342523
SN - 0141-0296
VL - 326
JO - Engineering Structures
JF - Engineering Structures
M1 - 119508
ER -