TY - GEN
T1 - A Malware Detection Method for Health Sensor Data Based on Machine Learning
AU - Liu, Hanwen
AU - Helu, Xiaohan
AU - Jin, Chengjie
AU - Lu, Hui
AU - Tian, Zhihong
AU - Du, Xiaojiang
AU - Abualsaud, Khalid
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/2
Y1 - 2020/2
N2 - Traditional signature-based malware detection approaches are sensitive to small changes in the malware code. Currently, most malware programs are adapted from existing programs. Hence, they share some common patterns but have different signatures. To health sensor data, it is necessary to identify the malware pattern rather than only detect the small changes. However, to detect these health sensor data in malware programs timely, we propose a fast detection strategy to detect the patterns in the code with machine learning-based approaches. In particular, XGBoost, LightGBM and Random Forests will be exploited in order to analyze the code from health sensor data. The codes are fed into them as sequences of bytes/tokens or just as a single byte/token (e.g. 1-, 2-, 3-, or 4-grams). Terabytes of program with labels, including benign and malware programs, have been collected. The challenges of this task are to select and get the features, modify the three models in order to train and test the dataset, which consists of health sensor data, and evaluate the features and models. When a malware program is detected by one model, its pattern will be broadcast to the other models, which will prevent malware program from intrusion effectively.
AB - Traditional signature-based malware detection approaches are sensitive to small changes in the malware code. Currently, most malware programs are adapted from existing programs. Hence, they share some common patterns but have different signatures. To health sensor data, it is necessary to identify the malware pattern rather than only detect the small changes. However, to detect these health sensor data in malware programs timely, we propose a fast detection strategy to detect the patterns in the code with machine learning-based approaches. In particular, XGBoost, LightGBM and Random Forests will be exploited in order to analyze the code from health sensor data. The codes are fed into them as sequences of bytes/tokens or just as a single byte/token (e.g. 1-, 2-, 3-, or 4-grams). Terabytes of program with labels, including benign and malware programs, have been collected. The challenges of this task are to select and get the features, modify the three models in order to train and test the dataset, which consists of health sensor data, and evaluate the features and models. When a malware program is detected by one model, its pattern will be broadcast to the other models, which will prevent malware program from intrusion effectively.
KW - common pattern
KW - health sensor data
KW - machine learning
KW - malware detection
UR - http://www.scopus.com/inward/record.url?scp=85085477733&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085477733&partnerID=8YFLogxK
U2 - 10.1109/ICIoT48696.2020.9089478
DO - 10.1109/ICIoT48696.2020.9089478
M3 - Conference contribution
AN - SCOPUS:85085477733
T3 - 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies, ICIoT 2020
SP - 277
EP - 282
BT - 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies, ICIoT 2020
T2 - 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies, ICIoT 2020
Y2 - 2 February 2020 through 5 February 2020
ER -