TY - JOUR
T1 - An Active-Learning Framework for Efficient Training and Bias Mitigation in Probabilistic Forecasting of Water Pipeline Failures
AU - Behrooz, Hojat
AU - Ilbeigi, Mohammad
N1 - Publisher Copyright:
© 2025 American Society of Civil Engineers.
PY - 2025/10/1
Y1 - 2025/10/1
N2 - Despite recent advancements in forecasting models for water pipe failures, their implementation remains challenging in many urban environments due to data scarcity. Because water pipe breaks are irregular and intermittent events, existing forecasting models rely on large data sets to achieve robust predictive accuracy. However, such data sets are often unavailable due to practical challenges and the high cost of water pipe condition assessments. Furthermore, small and incomplete data sets are prone to bias and imbalance, which further complicates the training process for forecasting models. To address these challenges, this study develops and empirically evaluates a novel active-learning mechanism that enhances forecasting models for water pipeline failure in two key ways: (1) enabling efficient model training with a significantly smaller data set by selecting the most informative observations, and (2) facilitating effective model training despite unbalanced and potentially biased data. The proposed active-learning mechanism is a progressive and iterative process built on four essential components: (1) stratified sampling through a multidimensional clustering mechanism, (2) cluster weight assignment, (3) a probabilistic forecasting model, and (4) a prediction deviation scoring method for each pipe in the test data. The proposed solution was implemented using historical data from Calgary, Canada. The results showed that the proposed active-learning framework, which selects observations for training, enabled an autoregressive deep-learning forecasting model to achieve a precision-recall area under the curve (PR-AUC) of 90% using only 42.5% of the data (6,052 pipes). In contrast, a similar forecasting model trained on randomly selected data required more than 80% of the data set (11,253 pipes) to reach the same predictive performance. These findings validate the effectiveness of the active-learning method in efficiently training forecasting models with small, unbalanced, and potentially biased data sets.
AB - Despite recent advancements in forecasting models for water pipe failures, their implementation remains challenging in many urban environments due to data scarcity. Because water pipe breaks are irregular and intermittent events, existing forecasting models rely on large data sets to achieve robust predictive accuracy. However, such data sets are often unavailable due to practical challenges and the high cost of water pipe condition assessments. Furthermore, small and incomplete data sets are prone to bias and imbalance, which further complicates the training process for forecasting models. To address these challenges, this study develops and empirically evaluates a novel active-learning mechanism that enhances forecasting models for water pipeline failure in two key ways: (1) enabling efficient model training with a significantly smaller data set by selecting the most informative observations, and (2) facilitating effective model training despite unbalanced and potentially biased data. The proposed active-learning mechanism is a progressive and iterative process built on four essential components: (1) stratified sampling through a multidimensional clustering mechanism, (2) cluster weight assignment, (3) a probabilistic forecasting model, and (4) a prediction deviation scoring method for each pipe in the test data. The proposed solution was implemented using historical data from Calgary, Canada. The results showed that the proposed active-learning framework, which selects observations for training, enabled an autoregressive deep-learning forecasting model to achieve a precision-recall area under the curve (PR-AUC) of 90% using only 42.5% of the data (6,052 pipes). In contrast, a similar forecasting model trained on randomly selected data required more than 80% of the data set (11,253 pipes) to reach the same predictive performance. These findings validate the effectiveness of the active-learning method in efficiently training forecasting models with small, unbalanced, and potentially biased data sets.
KW - Active learning
KW - Efficient training
KW - Probabilistic forecasting
KW - Stratified sampling
KW - Water pipe break
UR - https://www.scopus.com/pages/publications/105012461466
UR - https://www.scopus.com/pages/publications/105012461466#tab=citedBy
U2 - 10.1061/JPCFEV.CFENG-5149
DO - 10.1061/JPCFEV.CFENG-5149
M3 - Article
AN - SCOPUS:105012461466
SN - 0887-3828
VL - 39
JO - Journal of Performance of Constructed Facilities
JF - Journal of Performance of Constructed Facilities
IS - 5
M1 - 04025050
ER -