TY - JOUR
T1 - A Heuristic Statistical Testing Based Approach for Encrypted Network Traffic Identification
AU - Niu, Weina
AU - Zhuo, Zhongliu
AU - Zhang, Xiaosong
AU - Du, Xiaojiang
AU - Yang, Guowu
AU - Guizani, Mohsen
N1 - Publisher Copyright:
© 1967-2012 IEEE.
PY - 2019/4
Y1 - 2019/4
N2 - In recent years, malware with strong concealment uses encrypted protocol to evade detection. Thus, encrypted traffic identification can help security analysts to be more effective in narrowing down those encrypted network traffic. Existing methods are protocol independent, such as statistical-based and machine-learning-based approaches. Statistical-based approaches, however, are confined to payload length and machine-learning-based approaches have a low recognition rate for encrypted traffic using undisclosed protocols. In this paper, we proposed a heuristic statistical testing (HST) approach that combines both statistics and machine learning and has been proved to alleviate their respective deficiencies. We manually selected four randomness tests to extract small payload features for machine learning to improve real-time performances. We also proposed a simple handshake skipping method called HST-R to increase the classification accuracy. We compared our approach with other identification approaches on a testing dataset consisting of traffic that uses two known, two undisclosed, and one custom cryptographic protocols. Experimental results showed that HST-R performs better than other traditional coding-based, entropy-based, and ML-based approaches. We also showed that our handshake skipping method could generalize better for unknown cryptographic protocols. Finally, we also conducted experimental comparisons among different classification algorithms. The results showed that C4.5, with our method, has the highest identification accuracy for secure sockets layer and secure shell traffic.
AB - In recent years, malware with strong concealment uses encrypted protocol to evade detection. Thus, encrypted traffic identification can help security analysts to be more effective in narrowing down those encrypted network traffic. Existing methods are protocol independent, such as statistical-based and machine-learning-based approaches. Statistical-based approaches, however, are confined to payload length and machine-learning-based approaches have a low recognition rate for encrypted traffic using undisclosed protocols. In this paper, we proposed a heuristic statistical testing (HST) approach that combines both statistics and machine learning and has been proved to alleviate their respective deficiencies. We manually selected four randomness tests to extract small payload features for machine learning to improve real-time performances. We also proposed a simple handshake skipping method called HST-R to increase the classification accuracy. We compared our approach with other identification approaches on a testing dataset consisting of traffic that uses two known, two undisclosed, and one custom cryptographic protocols. Experimental results showed that HST-R performs better than other traditional coding-based, entropy-based, and ML-based approaches. We also showed that our handshake skipping method could generalize better for unknown cryptographic protocols. Finally, we also conducted experimental comparisons among different classification algorithms. The results showed that C4.5, with our method, has the highest identification accuracy for secure sockets layer and secure shell traffic.
KW - Encrypted traffic identification
KW - handshake skipping
KW - machine learning
KW - protocol-independent
KW - statistical testing
UR - http://www.scopus.com/inward/record.url?scp=85064666463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064666463&partnerID=8YFLogxK
U2 - 10.1109/TVT.2019.2894290
DO - 10.1109/TVT.2019.2894290
M3 - Article
AN - SCOPUS:85064666463
SN - 0018-9545
VL - 68
SP - 3843
EP - 3853
JO - IEEE Transactions on Vehicular Technology
JF - IEEE Transactions on Vehicular Technology
IS - 4
M1 - 8620362
ER -