TY - GEN
T1 - Automatically identifying performance issue reports with heuristic linguistic patterns
AU - Zhao, Yutong
AU - Xiao, Lu
AU - Babvey, Pouria
AU - Sun, Lei
AU - Wong, Sunny
AU - Martinez, Angel A.
AU - Wang, Xiao
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/11/8
Y1 - 2020/11/8
N2 - Performance issues compromise the response time and resource consumption of a software system. Modern software systems use issue tracking systems to manage all kinds of issue reports, including performance issues. The problem is that performance issues are often not explicitly tagged. The tagging mechanism, if exists, is completely voluntary, depending on the project's convention and on submitters' discipline. For example, the performance tag rate in Apache's Jira system is below 1%. This paper contributes a hybrid classification approach that combines linguistic patterns and machine/deep learning techniques to automatically detect performance issue reports. We manually analyzed 980 real-life performance issue reports and derived 80 project-agnostic linguistic patterns that recur in the reports. Our approach uses these linguistic patterns to construct the sentence-level and issue-level learning features for training effective machine/deep learning classifiers. We test our approach on two separate datasets, each consisting of 980 unclassified issue reports, and compare the results with 31 baseline methods. Our approach can reach up to 83% precision and up to 59% recall. The only comparable baseline method is BERT, which is still 25% lower in the F1-score.
AB - Performance issues compromise the response time and resource consumption of a software system. Modern software systems use issue tracking systems to manage all kinds of issue reports, including performance issues. The problem is that performance issues are often not explicitly tagged. The tagging mechanism, if exists, is completely voluntary, depending on the project's convention and on submitters' discipline. For example, the performance tag rate in Apache's Jira system is below 1%. This paper contributes a hybrid classification approach that combines linguistic patterns and machine/deep learning techniques to automatically detect performance issue reports. We manually analyzed 980 real-life performance issue reports and derived 80 project-agnostic linguistic patterns that recur in the reports. Our approach uses these linguistic patterns to construct the sentence-level and issue-level learning features for training effective machine/deep learning classifiers. We test our approach on two separate datasets, each consisting of 980 unclassified issue reports, and compare the results with 31 baseline methods. Our approach can reach up to 83% precision and up to 59% recall. The only comparable baseline method is BERT, which is still 25% lower in the F1-score.
KW - Performance optimization
KW - Software performance
KW - Software repositories mining
UR - http://www.scopus.com/inward/record.url?scp=85097161071&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097161071&partnerID=8YFLogxK
U2 - 10.1145/3368089.3409674
DO - 10.1145/3368089.3409674
M3 - Conference contribution
AN - SCOPUS:85097161071
T3 - ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 964
EP - 975
BT - ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Devanbu, Prem
A2 - Cohen, Myra
A2 - Zimmermann, Thomas
T2 - 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020
Y2 - 8 November 2020 through 13 November 2020
ER -