From raw to refined: Data preprocessing for construction machine learning (ML), deep learning (DL), and reinforcement learning (RL) models

Seyede Zahra Golazad, Abbas Mohammadi, Abbas Rashidi, Mohammad Ilbeigi

Research output: Contribution to journalReview articlepeer-review

19 Scopus citations

Abstract

As the use of predictive models in construction rapidly increases, the need for preprocessing raw construction data has become more critical. This systematic review investigates data preprocessing techniques for machine learning (ML), deep learning (DL), and reinforcement learning (RL) models in the construction domain. Through a comprehensive analysis of 457 studies, the prevalence of six data types (i.e., tabular, image, video frame, time series, text, and point cloud) and their respective preprocessing methods are examined. Key findings reveal data transformation, cleaning, reduction, augmentation, and scaling as fundamental preprocessing categories, with applications varying across data types. The paper highlights knowledge gaps, including limited synthetic data adoption, lack of standardized annotation practices, absence of comprehensive preprocessing frameworks, and need for automated labeling. Furthermore, critical considerations regarding data privacy, security, sharing, and management practices are discussed. The review underscores the pivotal role of robust data preprocessing in enabling reliable predictive models.

Original languageEnglish
Article number105844
JournalAutomation in Construction
Volume168
DOIs
StatePublished - 1 Dec 2024

Keywords

  • Annotation
  • Construction
  • Data augmentation
  • Data preprocessing
  • Data sharing
  • Data transformation
  • Deep learning
  • Machine learning
  • Reinforcement learning
  • Synthetic data

Fingerprint

Dive into the research topics of 'From raw to refined: Data preprocessing for construction machine learning (ML), deep learning (DL), and reinforcement learning (RL) models'. Together they form a unique fingerprint.

Cite this