Automated Identification of Machine Learning Technical Debt Code Comments

  • Omkar Khanvilkar
  • , Mohamed Wiem Mkaouer
  • , Eman Abdullah Alomar
  • , Abdelrahman Elsaid
  • , Amal Chaaben
  • , Mohamed Touati

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapid integration of machine learning (ML) systems within software projects introduces complex maintenance challenges known as Machine Learning Technical Debt (MLTD). Identifying and managing this technical debt is crucial for the sustainability and efficiency of ML systems. This paper presents a novel framework that leverages natural language processing techniques to detect MLTD types using comments written in the source code. Specifically, the spacy library's textcat_multilabel pipeline is employed to train a multi-label classification model designed to automatically classify code comments into distinct categories of MLTD, such as 'data debt,' 'model debt,' 'configuration debt,' and 'environment debt.' The dataset includes thousands of manually annotated comments from several largescale ML repositories, offering a diverse and comprehensive basis for training and testing the classifier. The approach is detailed through the processes involved in the preprocessing of the comment text, feature extraction, and the selection of appropriate model parameters. Challenges associated with working with sparse and domain-specific language typical of code comments are also discussed. Evaluation metrics show that the classifier achieves robust accuracy and precision across different types of MLTD, providing developers and project managers with a practical tool for early detection and management of MLTD. By automating the identification of technical debt through code comments, this method not only enhances the maintainability of ML projects but also enriches the practices surrounding documentation and proactive debt management in the field of machine learning.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE International Conference on Emerging Technologies and Computing, IC_ETC 2025
ISBN (Electronic)9798331587475
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Emerging Technologies and Computing, IC_ETC 2025 - Brest, France
Duration: 23 Jun 202526 Jun 2025

Publication series

NameProceedings - 2025 IEEE International Conference on Emerging Technologies and Computing, IC_ETC 2025

Conference

Conference2025 IEEE International Conference on Emerging Technologies and Computing, IC_ETC 2025
Country/TerritoryFrance
CityBrest
Period23/06/2526/06/25

Keywords

  • machine learning
  • quality
  • technical debt

Fingerprint

Dive into the research topics of 'Automated Identification of Machine Learning Technical Debt Code Comments'. Together they form a unique fingerprint.

Cite this