TY - JOUR
T1 - STG2P
T2 - A two-stage pipeline model for intrusion detection based on improved LightGBM and K-means
AU - Zhang, Zhiqiang
AU - Wang, Le
AU - Chen, Guangyao
AU - Gu, Zhaoquan
AU - Tian, Zhihong
AU - Du, Xiaojiang
AU - Guizani, Mohsen
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/11
Y1 - 2022/11
N2 - Network attack behavior is always mixed with a large number of normal communications, which makes the attack characteristics only account for a very small fraction in the log data. From the perspective of simulation and modeling, the data for attack detection is extremely unbalanced if we regard the attack behavior as the positive label. Network instruction detection is an important topic in identifying the attack behavior, but the detection methods based on simulation and model, such as traditional machine learning, face the challenges of poor effectiveness and efficiency. Supervised models, such as LightGBM, can effectively classify abnormal data because of the fast training speed and its high efficiency. However, it works badly when dealing with sparse negative data, such as the network intrusion data. On the other hand, unsupervised models, such as K-means, can achieve good performance with undesirable training time cost. However, it is difficult to select an appropriate parameter for network intrusion. In this paper, we propose a two-stage pipeline model named STG2P, which leverages the improved LightGBM and the reinforced K-means. Specifically, STG2P introduces a threshold for LightGBM in the coarse classification stage, and pipelines the draft results to K-means for filtering the false positive samples in the fine classification stage. By adaptively adopting the pipelined data of the improved LightGBM and K-means, the method can avoid the shortcomings of both models. We also conduct extensive simulations on the LANL dataset, and the results show that the AUC value can be improved as high as 29.48%. The detection rate of our method can reach 96.64%, which shows superior performance compared with some traditional detection methods.
AB - Network attack behavior is always mixed with a large number of normal communications, which makes the attack characteristics only account for a very small fraction in the log data. From the perspective of simulation and modeling, the data for attack detection is extremely unbalanced if we regard the attack behavior as the positive label. Network instruction detection is an important topic in identifying the attack behavior, but the detection methods based on simulation and model, such as traditional machine learning, face the challenges of poor effectiveness and efficiency. Supervised models, such as LightGBM, can effectively classify abnormal data because of the fast training speed and its high efficiency. However, it works badly when dealing with sparse negative data, such as the network intrusion data. On the other hand, unsupervised models, such as K-means, can achieve good performance with undesirable training time cost. However, it is difficult to select an appropriate parameter for network intrusion. In this paper, we propose a two-stage pipeline model named STG2P, which leverages the improved LightGBM and the reinforced K-means. Specifically, STG2P introduces a threshold for LightGBM in the coarse classification stage, and pipelines the draft results to K-means for filtering the false positive samples in the fine classification stage. By adaptively adopting the pipelined data of the improved LightGBM and K-means, the method can avoid the shortcomings of both models. We also conduct extensive simulations on the LANL dataset, and the results show that the AUC value can be improved as high as 29.48%. The detection rate of our method can reach 96.64%, which shows superior performance compared with some traditional detection methods.
KW - Improved LightGBM
KW - Intrusion detection
KW - Pipeline model
KW - Reinforced K-means
UR - http://www.scopus.com/inward/record.url?scp=85133608815&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133608815&partnerID=8YFLogxK
U2 - 10.1016/j.simpat.2022.102614
DO - 10.1016/j.simpat.2022.102614
M3 - Article
AN - SCOPUS:85133608815
SN - 1569-190X
VL - 120
JO - Simulation Modelling Practice and Theory
JF - Simulation Modelling Practice and Theory
M1 - 102614
ER -