An Active-Learning Framework for Efficient Training and Bias Mitigation in Probabilistic Forecasting of Water Pipeline Failures

Research output: Contribution to journalArticlepeer-review

Abstract

Despite recent advancements in forecasting models for water pipe failures, their implementation remains challenging in many urban environments due to data scarcity. Because water pipe breaks are irregular and intermittent events, existing forecasting models rely on large data sets to achieve robust predictive accuracy. However, such data sets are often unavailable due to practical challenges and the high cost of water pipe condition assessments. Furthermore, small and incomplete data sets are prone to bias and imbalance, which further complicates the training process for forecasting models. To address these challenges, this study develops and empirically evaluates a novel active-learning mechanism that enhances forecasting models for water pipeline failure in two key ways: (1) enabling efficient model training with a significantly smaller data set by selecting the most informative observations, and (2) facilitating effective model training despite unbalanced and potentially biased data. The proposed active-learning mechanism is a progressive and iterative process built on four essential components: (1) stratified sampling through a multidimensional clustering mechanism, (2) cluster weight assignment, (3) a probabilistic forecasting model, and (4) a prediction deviation scoring method for each pipe in the test data. The proposed solution was implemented using historical data from Calgary, Canada. The results showed that the proposed active-learning framework, which selects observations for training, enabled an autoregressive deep-learning forecasting model to achieve a precision-recall area under the curve (PR-AUC) of 90% using only 42.5% of the data (6,052 pipes). In contrast, a similar forecasting model trained on randomly selected data required more than 80% of the data set (11,253 pipes) to reach the same predictive performance. These findings validate the effectiveness of the active-learning method in efficiently training forecasting models with small, unbalanced, and potentially biased data sets.

Original languageEnglish
Article number04025050
JournalJournal of Performance of Constructed Facilities
Volume39
Issue number5
DOIs
StatePublished - 1 Oct 2025

Keywords

  • Active learning
  • Efficient training
  • Probabilistic forecasting
  • Stratified sampling
  • Water pipe break

Fingerprint

Dive into the research topics of 'An Active-Learning Framework for Efficient Training and Bias Mitigation in Probabilistic Forecasting of Water Pipeline Failures'. Together they form a unique fingerprint.

Cite this