TY - GEN
T1 - Fine-Grained Just-In-Time Defect Prediction at the Block Level in Infrastructure-as-Code (IaC)
AU - Begoug, Mahi
AU - Chouchen, Moataz
AU - Ouni, Ali
AU - Alomar, Eman Abdullah
AU - Mkaouer, Mohamed Wiem
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024
Y1 - 2024
N2 - Infrastructure-as-Code (IaC) is an emerging software engineering practice that leverages source code to facilitate automated configuration of software systems' infrastructure. IaC files are typically complex, containing hundreds of lines of code and dependencies, making them prone to defects, which can result in breaking online services at scale. To help developers early identify and fix IaC defects, research efforts have introduced IaC defect prediction models at the file level. However, the granularity of the proposed approaches remains coarse-grained, requiring developers to inspect hundreds of lines of code in a file, while only a small fragment of code is defective. To alleviate this issue, we introduce a machinelearning-based approach to predict IaC defects at a fine-grained level, focusing on IaC blocks, i.e., small code units that encapsulate specific behaviours within an IaC file. We trained various machine learning algorithms based on a mixture of code, process, and change-level metrics. We evaluated our approach on 19 open-source projects that use Terraform, a widely used IaC tool. The results indicated that there is no single algorithm that consistently outperforms the others in 19 projects. Overall, among the six algorithms, we observed that the LightGBM model achieved a higher average of 0.21 in terms of MCC and 0.71 in terms of AUC. Models analysis reveals that the developer's experience and the relative number of added lines tend to be the most important features. Additionally, we found that blocks belonging to the most frequent types are more prone to defects. Our defect prediction models have also shown sensitivity to concept drift, indicating that IaC practitioners should regularly retrain their models.
AB - Infrastructure-as-Code (IaC) is an emerging software engineering practice that leverages source code to facilitate automated configuration of software systems' infrastructure. IaC files are typically complex, containing hundreds of lines of code and dependencies, making them prone to defects, which can result in breaking online services at scale. To help developers early identify and fix IaC defects, research efforts have introduced IaC defect prediction models at the file level. However, the granularity of the proposed approaches remains coarse-grained, requiring developers to inspect hundreds of lines of code in a file, while only a small fragment of code is defective. To alleviate this issue, we introduce a machinelearning-based approach to predict IaC defects at a fine-grained level, focusing on IaC blocks, i.e., small code units that encapsulate specific behaviours within an IaC file. We trained various machine learning algorithms based on a mixture of code, process, and change-level metrics. We evaluated our approach on 19 open-source projects that use Terraform, a widely used IaC tool. The results indicated that there is no single algorithm that consistently outperforms the others in 19 projects. Overall, among the six algorithms, we observed that the LightGBM model achieved a higher average of 0.21 in terms of MCC and 0.71 in terms of AUC. Models analysis reveals that the developer's experience and the relative number of added lines tend to be the most important features. Additionally, we found that blocks belonging to the most frequent types are more prone to defects. Our defect prediction models have also shown sensitivity to concept drift, indicating that IaC practitioners should regularly retrain their models.
KW - Defect Prediction
KW - IaC
KW - Infrastructure-as-Code
KW - Terraform
UR - http://www.scopus.com/inward/record.url?scp=85197260220&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197260220&partnerID=8YFLogxK
U2 - 10.1145/3643991.3644934
DO - 10.1145/3643991.3644934
M3 - Conference contribution
AN - SCOPUS:85197260220
T3 - Proceedings - 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024
SP - 100
EP - 112
BT - Proceedings - 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024
T2 - 21st IEEE/ACM International Conference on Mining Software Repositories, MSR 2024
Y2 - 15 April 2024 through 16 April 2024
ER -