Addressing Overfitting in an Imbalanced Dataset for MS Progression Prediction

  • Shima Pilehvari
  • , Wei Peng
  • , Yasser Morgan
  • , Mohammad Ali Sahraian
  • , Sharareh Eskandarieh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Overfitting is a common problem during model training, particularly for binary medical datasets with class imbalance. This research specifically addresses this issue in predicting Multiple Sclerosis (MS) progression, with the primary goal of improving model accuracy and reliability. By investigating various data resampling techniques, ensemble methods, feature extraction, and model regularization, the study thoroughly evaluates the effectiveness of these strategies in enhancing stability and performance for highly imbalanced datasets. Compared to prior studies, this research advances existing approaches by integrating Kernel Principal Component Analysis (KPCA), moderate under-sampling, Synthetic Minority Oversampling Technique (SMOTE), and post-processing techniques, including Youden’s J Statistic and manual threshold adjustments. This comprehensive strategy significantly reduced overfitting while improving the generalization of models, particularly the Multilayer Perceptron (MLP), which achieved an Area Under the Curve (AUC) of 0.98—outperforming previous models in similar applications. These findings establish important best practices for developing robust prognostic models for MS progression and underscore the importance of tailored solutions in complex medical prediction tasks.

Original languageEnglish
Title of host publicationProceedings of 10th International Congress on Information and Communication Technology, ICICT 2025
EditorsXin-She Yang, R. Simon Sherratt, Nilanjan Dey, Amit Joshi
Pages467-481
Number of pages15
DOIs
StatePublished - 2025
Event10th International Congress on Information and Communication Technology, ICICT 2025 - London, United Kingdom
Duration: 18 Feb 202521 Feb 2025

Publication series

NameLecture Notes in Networks and Systems
Volume1441 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference10th International Congress on Information and Communication Technology, ICICT 2025
Country/TerritoryUnited Kingdom
CityLondon
Period18/02/2521/02/25

Keywords

  • Feature extraction
  • Imbalanced data
  • Multiple sclerosis (MS)
  • Overfitting
  • Post-processing techniques
  • Resampling techniques

Fingerprint

Dive into the research topics of 'Addressing Overfitting in an Imbalanced Dataset for MS Progression Prediction'. Together they form a unique fingerprint.

Cite this