Fusing Data Extracted from Bridge Inspection Reports for Enhanced Data-Driven Bridge Deterioration Prediction: A Hybrid Data Fusion Method

Kaijian Liu, Nora El-Gohary

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Data buried in textual bridge inspection reports offer great promise for enhanced data-driven bridge deterioration prediction. However, learning from these reports is challenging because they typically use multiple concept names to refer to the same entity and typically describe multiple instances of the same type of deficiency. Such multiple names and instances increase the dimensionality and the sparsity of the feature space, which would cause overfitting to a particular feature, undermine the generalizability of the machine learning models, and compromise the performance of the data-driven prediction. To address this challenge, this paper proposes a new hybrid data fusion method. It combines an unsupervised named entity normalization method and an entropy-based numerical data fusion method to fuse concept names and numerical data, respectively. The proposed normalization method uses an n-gram model to generate candidate canonical identifier names and utilizes corpus statistics and lexical patterns to fuse the multiple concept names into a candidate name that balances abstraction and detailedness. The proposed fusion method uses data discretization and information entropy to fuse the multiple deficiency measures (of the instances) into a single representation. The hybrid fusion method was validated in fusing data extracted from textual bridge inspection reports for supporting the prediction of future bridge condition ratings. Learning from the fused data, compared to learning from the unfused data, improved the accuracies of predicting the ratings of decks, superstructures, and substructures by 8.0%, 8.5%, and 7.9%, respectively.

Original languageEnglish
Article number04020047
JournalJournal of Computing in Civil Engineering
Volume34
Issue number6
DOIs
StatePublished - 1 Nov 2020

Keywords

  • Bridges
  • Data fusion
  • Data-driven approach
  • Deterioration prediction
  • Machine learning
  • Named entity normalization

Fingerprint

Dive into the research topics of 'Fusing Data Extracted from Bridge Inspection Reports for Enhanced Data-Driven Bridge Deterioration Prediction: A Hybrid Data Fusion Method'. Together they form a unique fingerprint.

Cite this