Abstract
Haloacetonitriles (HANs) are highly toxic disinfection byproduct-detected in drinking water. In this study, we applied machine learning (ML) to investigate the formation of dichloroacetonitrile (DCAN), the most common HAN, using a large literature-derived dataset. Among four models evaluated, CatBoost demonstrated the best predictive performance. SHapley Additive exPlanation (SHAP) analysis revealed that DCAN formation is not solely governed by individual parameters but is substantially influenced by feature interactions. For instance, while dissolved organic carbon (DOC) is generally positively correlated with DCAN formation, this relationship trends to weaken at higher specific ultraviolet absorbance at 254 nm (SUVA254) values, underscoring the role of non-aromatic fractions in DCAN formation. The interaction between DOC and SUVA254 is further influenced by the disinfectant, with chloramination generally resulting in lower formation than chlorination. To assess model generalizability, we developed a Reliability Index (RI) framework, which integrates a distributional similarity score (Mahalanobis distance) and an anomaly detection score (One-Class Support Vector Machine) to quantify how representative new data are relative to the training set. The model showed strong performance on an external dataset when RI values exceeded 0.25. This study demonstrates the potential of ML in uncovering complex mechanisms driving DCAN formation and introduces RI as a transferable tool for evaluating the generalizability of predictive models.
| Original language | English |
|---|---|
| Article number | 124823 |
| Journal | Water Research |
| Volume | 289 |
| DOIs | |
| State | Published - 15 Jan 2026 |
Keywords
- Data imputation
- Dichloroacetonitrile (DCAN)
- Disinfection byproducts (DBPs)
- Generalizability
- SHAP interaction
- Synthetic data
Fingerprint
Dive into the research topics of 'Toward generalizable machine learning models for dichloroacetonitrile formation: Interpretable insights and a framework for model reliability'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver