TY - JOUR
T1 - System log detection model based on conformal prediction
AU - Ren, Yitong
AU - Gu, Zhaojun
AU - Wang, Zhi
AU - Tian, Zhihong
AU - Liu, Chunbo
AU - Lu, Hui
AU - Du, Xiaojiang
AU - Guizani, Mohsen
N1 - Publisher Copyright:
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2020/2
Y1 - 2020/2
N2 - With the rapid development of the Internet of Things, the combination of the Internet of Things with machine learning, Hadoop and other fields are current development trends. Hadoop Distributed File System (HDFS) is one of the core components of Hadoop, which is used to process files that are divided into data blocks distributed in the cluster. Once the distributed log data are abnormal, it will cause serious losses. When using machine learning algorithms for system log anomaly detection, the output of threshold‐based classification models are only normal or abnormal simple predictions. This paper used the statistical learning method of conformity measure to calculate the similarity between test data and past experience. Compared with detection methods based on static threshold, the statistical learning method of the conformity measure can dynamically adapt to the changing log data. By adjusting the maximum fault tolerance, a system administrator can better manage and monitor the system logs. In addition, the computational efficiency of the statistical learning method for conformity measurement was improved. This paper implemented an intranet anomaly detection model based on log analysis, and conducted trial detection on HDFS data sets quickly and efficiently.
AB - With the rapid development of the Internet of Things, the combination of the Internet of Things with machine learning, Hadoop and other fields are current development trends. Hadoop Distributed File System (HDFS) is one of the core components of Hadoop, which is used to process files that are divided into data blocks distributed in the cluster. Once the distributed log data are abnormal, it will cause serious losses. When using machine learning algorithms for system log anomaly detection, the output of threshold‐based classification models are only normal or abnormal simple predictions. This paper used the statistical learning method of conformity measure to calculate the similarity between test data and past experience. Compared with detection methods based on static threshold, the statistical learning method of the conformity measure can dynamically adapt to the changing log data. By adjusting the maximum fault tolerance, a system administrator can better manage and monitor the system logs. In addition, the computational efficiency of the statistical learning method for conformity measurement was improved. This paper implemented an intranet anomaly detection model based on log analysis, and conducted trial detection on HDFS data sets quickly and efficiently.
KW - Anomaly detection
KW - Conformal prediction
KW - Confusion matrix
KW - HDFS
UR - http://www.scopus.com/inward/record.url?scp=85079501873&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079501873&partnerID=8YFLogxK
U2 - 10.3390/electronics9020232
DO - 10.3390/electronics9020232
M3 - Article
AN - SCOPUS:85079501873
VL - 9
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 2
M1 - 232
ER -