TY - JOUR
T1 - Tuberculosis and pneumonia diagnosis in chest X-rays by large adaptive filter and aligning normalized network with report-guided multi-level alignment
AU - Lu, Si Yuan
AU - Zhu, Ziquan
AU - Zhang, Yu Dong
AU - Yao, Yu Dong
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/10/15
Y1 - 2025/10/15
N2 - Tuberculosis (TB) and pneumonia remain major global public health challenges, necessitating accurate and efficient diagnostic tools. This study proposes a novel deep learning framework, Large Adaptive Filter and Aligning Normalized Network (LAFAN-Net), designed to improve chest X-ray (CXR) diagnosis by integrating visual and textual information. The framework comprises three key components: (1) a report-guided multi-level alignment mechanism that aligns CXR features with radiology reports at the token, sample, and disease levels; (2) a large adaptive filter block for capturing multi-scale visual patterns; and (3) AlignNorm, a new normalization technique that mitigates oversmoothing and enhances feature separation. LAFAN-Net is evaluated on three publicly available CXR datasets, achieving accuracies of 97.14 %, 95.35 %, and 89.39 %, and F1 scores of 90.77 %, 96.32 %, and 88.33 %, respectively. Extensive ablation studies confirm the model's robustness. The results underscore LAFAN-Net's ability to extract clinically meaningful features while maintaining interpretability, supported by singular value distributions and Gradient-weighted Class Activation Mapping visualizations. Future work will explore extending the model to broader disease categories and multi-class classification tasks to enhance clinical utility. In addition, improving computational efficiency and ensuring real-time applicability are essential for deployment in resource-limited settings.
AB - Tuberculosis (TB) and pneumonia remain major global public health challenges, necessitating accurate and efficient diagnostic tools. This study proposes a novel deep learning framework, Large Adaptive Filter and Aligning Normalized Network (LAFAN-Net), designed to improve chest X-ray (CXR) diagnosis by integrating visual and textual information. The framework comprises three key components: (1) a report-guided multi-level alignment mechanism that aligns CXR features with radiology reports at the token, sample, and disease levels; (2) a large adaptive filter block for capturing multi-scale visual patterns; and (3) AlignNorm, a new normalization technique that mitigates oversmoothing and enhances feature separation. LAFAN-Net is evaluated on three publicly available CXR datasets, achieving accuracies of 97.14 %, 95.35 %, and 89.39 %, and F1 scores of 90.77 %, 96.32 %, and 88.33 %, respectively. Extensive ablation studies confirm the model's robustness. The results underscore LAFAN-Net's ability to extract clinically meaningful features while maintaining interpretability, supported by singular value distributions and Gradient-weighted Class Activation Mapping visualizations. Future work will explore extending the model to broader disease categories and multi-class classification tasks to enhance clinical utility. In addition, improving computational efficiency and ensuring real-time applicability are essential for deployment in resource-limited settings.
KW - Chest X-ray
KW - Computer-aided diagnosis
KW - Multi-modal
KW - Pneumonia
KW - Tuberculosis
UR - https://www.scopus.com/pages/publications/105008818106
UR - https://www.scopus.com/pages/publications/105008818106#tab=citedBy
U2 - 10.1016/j.engappai.2025.111575
DO - 10.1016/j.engappai.2025.111575
M3 - Article
AN - SCOPUS:105008818106
SN - 0952-1976
VL - 158
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 111575
ER -