TY - JOUR
T1 - Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports
AU - Liu, Kaijian
AU - El-Gohary, Nora
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
PY - 2017/9
Y1 - 2017/9
N2 - A large amount of detailed data about bridge conditions and maintenance actions are buried in bridge inspection reports without being used. Information extraction and data analytics open opportunities to leverage this wealth of data for improved bridge deterioration prediction and enhanced maintenance decision making. This paper proposes a novel ontology-based, semi-supervised conditional random fields (CRF)-based information extraction methodology for extracting information entities describing existing deficiencies and performed maintenance actions from bridge inspection reports. The ontology facilitates the analysis of the text based on content and domain-specific meaning. The proposed semi-supervised CRF simultaneously captures the dependency structures as well as the distributions of labeled and unlabeled data in a concave machine-learning function. It learns from a small set of fixed labeled data and, at the same time, dynamically adapts itself to unseen instances by further learning from a large set of unlabeled data for both reduced human effort and high performance. The proposed algorithm achieved an average precision, recall and, F-1 measure of 94.1%, 87.7%, and 90.7%, respectively.
AB - A large amount of detailed data about bridge conditions and maintenance actions are buried in bridge inspection reports without being used. Information extraction and data analytics open opportunities to leverage this wealth of data for improved bridge deterioration prediction and enhanced maintenance decision making. This paper proposes a novel ontology-based, semi-supervised conditional random fields (CRF)-based information extraction methodology for extracting information entities describing existing deficiencies and performed maintenance actions from bridge inspection reports. The ontology facilitates the analysis of the text based on content and domain-specific meaning. The proposed semi-supervised CRF simultaneously captures the dependency structures as well as the distributions of labeled and unlabeled data in a concave machine-learning function. It learns from a small set of fixed labeled data and, at the same time, dynamically adapts itself to unseen instances by further learning from a large set of unlabeled data for both reduced human effort and high performance. The proposed algorithm achieved an average precision, recall and, F-1 measure of 94.1%, 87.7%, and 90.7%, respectively.
KW - Bridges
KW - Conditional random fields
KW - Deterioration prediction
KW - Information extraction
KW - Maintenance decision making
KW - Ontology
KW - Semi-supervised machine learning
UR - http://www.scopus.com/inward/record.url?scp=85019933777&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85019933777&partnerID=8YFLogxK
U2 - 10.1016/j.autcon.2017.02.003
DO - 10.1016/j.autcon.2017.02.003
M3 - Article
AN - SCOPUS:85019933777
SN - 0926-5805
VL - 81
SP - 313
EP - 327
JO - Automation in Construction
JF - Automation in Construction
ER -