TY - JOUR
T1 - From raw to refined
T2 - Data preprocessing for construction machine learning (ML), deep learning (DL), and reinforcement learning (RL) models
AU - Golazad, Seyede Zahra
AU - Mohammadi, Abbas
AU - Rashidi, Abbas
AU - Ilbeigi, Mohammad
N1 - Publisher Copyright:
© 2024
PY - 2024/12/1
Y1 - 2024/12/1
N2 - As the use of predictive models in construction rapidly increases, the need for preprocessing raw construction data has become more critical. This systematic review investigates data preprocessing techniques for machine learning (ML), deep learning (DL), and reinforcement learning (RL) models in the construction domain. Through a comprehensive analysis of 457 studies, the prevalence of six data types (i.e., tabular, image, video frame, time series, text, and point cloud) and their respective preprocessing methods are examined. Key findings reveal data transformation, cleaning, reduction, augmentation, and scaling as fundamental preprocessing categories, with applications varying across data types. The paper highlights knowledge gaps, including limited synthetic data adoption, lack of standardized annotation practices, absence of comprehensive preprocessing frameworks, and need for automated labeling. Furthermore, critical considerations regarding data privacy, security, sharing, and management practices are discussed. The review underscores the pivotal role of robust data preprocessing in enabling reliable predictive models.
AB - As the use of predictive models in construction rapidly increases, the need for preprocessing raw construction data has become more critical. This systematic review investigates data preprocessing techniques for machine learning (ML), deep learning (DL), and reinforcement learning (RL) models in the construction domain. Through a comprehensive analysis of 457 studies, the prevalence of six data types (i.e., tabular, image, video frame, time series, text, and point cloud) and their respective preprocessing methods are examined. Key findings reveal data transformation, cleaning, reduction, augmentation, and scaling as fundamental preprocessing categories, with applications varying across data types. The paper highlights knowledge gaps, including limited synthetic data adoption, lack of standardized annotation practices, absence of comprehensive preprocessing frameworks, and need for automated labeling. Furthermore, critical considerations regarding data privacy, security, sharing, and management practices are discussed. The review underscores the pivotal role of robust data preprocessing in enabling reliable predictive models.
KW - Annotation
KW - Construction
KW - Data augmentation
KW - Data preprocessing
KW - Data sharing
KW - Data transformation
KW - Deep learning
KW - Machine learning
KW - Reinforcement learning
KW - Synthetic data
UR - https://www.scopus.com/pages/publications/85206910365
UR - https://www.scopus.com/inward/citedby.url?scp=85206910365&partnerID=8YFLogxK
U2 - 10.1016/j.autcon.2024.105844
DO - 10.1016/j.autcon.2024.105844
M3 - Review article
AN - SCOPUS:85206910365
SN - 0926-5805
VL - 168
JO - Automation in Construction
JF - Automation in Construction
M1 - 105844
ER -