Effect of dataset representation bias on generalizability of machine learning models in predicting flexural properties of ultra-high-performance concrete (UHPC) beams

Jinxin Chen, Yi Bao

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Machine learning (ML) offers transformative potential in structural design through the high efficiency in exploring optimal solutions within vast design spaces. However, concerns persist among structural engineers regarding the generalizability and reliability of ML models trained on datasets with biases. This study investigates the influence of dataset representation bias on the generalizability of ML models in predicting the flexural properties of ultra-high-performance concrete beams. The research addresses three primary objectives: (1) developing a novel metric to quantify representation bias of continuous datasets, (2) devising an algorithm to create datasets with controlled bias levels, and (3) evaluating the effect of dataset representation bias on the generalizability of ML models. Key contributions include the first comprehensive analysis of representation bias in structural prediction models, the introduction of a Monte Carlo Bias Estimation method for evaluating dataset bias, the development of an Adaptive Bias Sampling Algorithm for dataset generation, and the modification of Latin Hypercube Sampling to ensure uniform dataset distribution. Findings reveal that dataset bias significantly undermines the generalizability of ML models, and the proposed methods offer effective strategies for assessing and mitigating dataset bias, thereby enhancing the generalizability of ML models.

Original languageEnglish
Article number119508
JournalEngineering Structures
Volume326
DOIs
StatePublished - 1 Mar 2025

Keywords

  • Data-driven modeling
  • Flexural properties
  • Generalizability
  • Machine learning (ML)
  • Representation bias
  • Structural design

Fingerprint

Dive into the research topics of 'Effect of dataset representation bias on generalizability of machine learning models in predicting flexural properties of ultra-high-performance concrete (UHPC) beams'. Together they form a unique fingerprint.

Cite this