TY - JOUR
T1 - A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports with Heuristic Linguistic Patterns
AU - Zhao, Yutong
AU - Xiao, Lu
AU - Wong, Sunny
N1 - Publisher Copyright:
© 2024 The Authors.
PY - 2024
Y1 - 2024
N2 - Software performance is critical for system efficiency, with performance issues potentially resulting in budget overruns, project delays, and market losses. Such problems are reported to developers through issue tracking systems, which are often under-tagged, as the manual tagging process is voluntary and time-consuming. Existing automated performance issue tagging techniques, such as keyword matching and machine/deep learning models, struggle due to imbalanced datasets and a high degree of variance. This paper presents a novel hybrid classification approach, combining Heuristic Linguistic Patterns (HLPs) with machine/deep learning models to enable practitioners to automatically identify performance-related issues. The proposed approach works across three progressive levels: HLP tagging, sentence tagging, and issue tagging, with a focus on linguistic analysis of issue descriptions. The authors evaluate the approach on three different datasets collected from different projects and issue-tracking platforms to prove that the proposed framework is accurate, project-and platform-agnostic, and robust to imbalanced datasets. Furthermore, this study also examined how the two unique techniques of the framework, including the fuzzy HLP matching and the Issue HLP Matrix, contribute to the accuracy. Finally, the study explored the effectiveness and impact of two off-the-shelf feature selection techniques, Boruta and RFE, with the proposed framework. The results showed that the proposed framework has great potential for practitioners to accurately (with up to 100% precision, 66% recall, and 79% F1-score) identify performance issues, with robustness to imbalanced data and good transferability to new projects and issue tracking platforms.
AB - Software performance is critical for system efficiency, with performance issues potentially resulting in budget overruns, project delays, and market losses. Such problems are reported to developers through issue tracking systems, which are often under-tagged, as the manual tagging process is voluntary and time-consuming. Existing automated performance issue tagging techniques, such as keyword matching and machine/deep learning models, struggle due to imbalanced datasets and a high degree of variance. This paper presents a novel hybrid classification approach, combining Heuristic Linguistic Patterns (HLPs) with machine/deep learning models to enable practitioners to automatically identify performance-related issues. The proposed approach works across three progressive levels: HLP tagging, sentence tagging, and issue tagging, with a focus on linguistic analysis of issue descriptions. The authors evaluate the approach on three different datasets collected from different projects and issue-tracking platforms to prove that the proposed framework is accurate, project-and platform-agnostic, and robust to imbalanced datasets. Furthermore, this study also examined how the two unique techniques of the framework, including the fuzzy HLP matching and the Issue HLP Matrix, contribute to the accuracy. Finally, the study explored the effectiveness and impact of two off-the-shelf feature selection techniques, Boruta and RFE, with the proposed framework. The results showed that the proposed framework has great potential for practitioners to accurately (with up to 100% precision, 66% recall, and 79% F1-score) identify performance issues, with robustness to imbalanced data and good transferability to new projects and issue tracking platforms.
KW - automatic text classification
KW - linguistic pattern
KW - Software performance
KW - software repository mining
UR - http://www.scopus.com/inward/record.url?scp=85190749141&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190749141&partnerID=8YFLogxK
U2 - 10.1109/TSE.2024.3390623
DO - 10.1109/TSE.2024.3390623
M3 - Article
AN - SCOPUS:85190749141
SN - 0098-5589
VL - 50
SP - 1704
EP - 1725
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
IS - 7
ER -