TY - JOUR
T1 - How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation
AU - AlOmar, Eman Abdullah
AU - Peruma, Anthony
AU - Mkaouer, Mohamed Wiem
AU - Newman, Christian
AU - Ouni, Ali
AU - Kessentini, Marouane
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Refactoring is the art of improving the structural design of a software system without altering its external behavior. Today, refactoring has become a well-established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design especially with the emerging challenges in contemporary software engineering. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. To cope with the above-mentioned limitations, we aim to better understand what motivates developers to apply a refactoring by mining and automatically classifying a large set of 111,884 commits containing refactoring activities, extracted from 800 open source Java projects. We trained a multi-class classifier to categorize these commits into three categories, namely, Internal Quality Attribute, External Quality Attribute, and Code Smell Resolution, along with the traditional Bug Fix and Functional categories. This classification challenges the original definition of refactoring, being exclusive to improving software design and fixing code smells. Furthermore, to better understand our classification results, we qualitatively analyzed commit messages to extract textual patterns that developers regularly use to describe their refactoring activities. The results of our empirical investigation show that (1) fixing code smells is not the main driver for developers to refactoring their code bases. Refactoring is solicited for a wide variety of reasons, going beyond its traditional definition; (2) the distribution of refactoring operations differs between production and test files; (3) developers use a variety of patterns to purposefully target refactoring-related activities; (4) the textual patterns, extracted from commit messages, provide better coverage for how developers document their refactorings.
AB - Refactoring is the art of improving the structural design of a software system without altering its external behavior. Today, refactoring has become a well-established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design especially with the emerging challenges in contemporary software engineering. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. To cope with the above-mentioned limitations, we aim to better understand what motivates developers to apply a refactoring by mining and automatically classifying a large set of 111,884 commits containing refactoring activities, extracted from 800 open source Java projects. We trained a multi-class classifier to categorize these commits into three categories, namely, Internal Quality Attribute, External Quality Attribute, and Code Smell Resolution, along with the traditional Bug Fix and Functional categories. This classification challenges the original definition of refactoring, being exclusive to improving software design and fixing code smells. Furthermore, to better understand our classification results, we qualitatively analyzed commit messages to extract textual patterns that developers regularly use to describe their refactoring activities. The results of our empirical investigation show that (1) fixing code smells is not the main driver for developers to refactoring their code bases. Refactoring is solicited for a wide variety of reasons, going beyond its traditional definition; (2) the distribution of refactoring operations differs between production and test files; (3) developers use a variety of patterns to purposefully target refactoring-related activities; (4) the textual patterns, extracted from commit messages, provide better coverage for how developers document their refactorings.
KW - Machine learning
KW - Refactoring
KW - Software engineering
KW - Software quality
UR - http://www.scopus.com/inward/record.url?scp=85095827518&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095827518&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2020.114176
DO - 10.1016/j.eswa.2020.114176
M3 - Article
AN - SCOPUS:85095827518
SN - 0957-4174
VL - 167
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 114176
ER -