TY - GEN
T1 - Intelligent Code Review Assignment for Large Scale Open Source Software Stacks
AU - Aryendu, Ishan
AU - Wang, Ying
AU - Elkourdi, Farah
AU - Alomar, Eman Abdullah
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/9/19
Y1 - 2022/9/19
N2 - In the process of developing software, code review is crucial. By identifying problems before they arise in production, it enhances the quality of the code. Finding the best reviewer for a code change, however, is extremely challenging especially in large scale, especially open source software stacks with cross functioning designs and collaborations among multiple developers and teams. Additionally, a review by someone who lacks knowledge and understanding of the code can result in high resource consumption and technical errors. The reviewers who have the specialty in both functioning (domain knowledge) and non-functioning areas of a commit are considered as the most qualified reviewer to look over any changes to the code. Quality attributes serve as the connection among the user requirements, delivered function description, software architecture and implementation through put the entire software stack cycle. In this study, we target on auto reviewer assignment in large scale software stacks and aim to build a self-learning, and self-correct platform for intelligently matching between a commit based on its quality attributes and the skills sets of reviewers. To achieve this, quality attributes are classified and abstracted from the commit messages and based on which, the commits are assigned to the reviewers with the capability in reviewing the target commits. We first designed machine learning schemes for abstracting quality attributes based on historical data from the OpenStack repository. Two models are built and trained for automating the classification of the commits based on their quality attributes using the manual labeling of commits and multi-class classifiers. We then positioned the reviewers based on their historical data and the quality attributes characteristics. Finally we selected the recommended reviewer based on the distance between a commit and candidate reviewers. In this paper, we demonstrate how the models can choose the best quality attributes and assign the code review to the most qualified reviewers. With a comparatively small training dataset, the models are able to achieve F-1 scores of 77% and 85.31%, respectively.
AB - In the process of developing software, code review is crucial. By identifying problems before they arise in production, it enhances the quality of the code. Finding the best reviewer for a code change, however, is extremely challenging especially in large scale, especially open source software stacks with cross functioning designs and collaborations among multiple developers and teams. Additionally, a review by someone who lacks knowledge and understanding of the code can result in high resource consumption and technical errors. The reviewers who have the specialty in both functioning (domain knowledge) and non-functioning areas of a commit are considered as the most qualified reviewer to look over any changes to the code. Quality attributes serve as the connection among the user requirements, delivered function description, software architecture and implementation through put the entire software stack cycle. In this study, we target on auto reviewer assignment in large scale software stacks and aim to build a self-learning, and self-correct platform for intelligently matching between a commit based on its quality attributes and the skills sets of reviewers. To achieve this, quality attributes are classified and abstracted from the commit messages and based on which, the commits are assigned to the reviewers with the capability in reviewing the target commits. We first designed machine learning schemes for abstracting quality attributes based on historical data from the OpenStack repository. Two models are built and trained for automating the classification of the commits based on their quality attributes using the manual labeling of commits and multi-class classifiers. We then positioned the reviewers based on their historical data and the quality attributes characteristics. Finally we selected the recommended reviewer based on the distance between a commit and candidate reviewers. In this paper, we demonstrate how the models can choose the best quality attributes and assign the code review to the most qualified reviewers. With a comparatively small training dataset, the models are able to achieve F-1 scores of 77% and 85.31%, respectively.
KW - Code Review
KW - Commit Classification
KW - Large-scale
KW - MPNet
KW - Machine Learning
KW - Open-source
UR - http://www.scopus.com/inward/record.url?scp=85146960689&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146960689&partnerID=8YFLogxK
U2 - 10.1145/3551349.3561147
DO - 10.1145/3551349.3561147
M3 - Conference contribution
AN - SCOPUS:85146960689
T3 - ACM International Conference Proceeding Series
BT - 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022
A2 - Aehnelt, Mario
A2 - Kirste, Thomas
T2 - 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022
Y2 - 10 October 2022 through 14 October 2022
ER -