TY - JOUR
T1 - An Unbalanced Data Hybrid-Sampling Algorithm Based on Multi-Information Fusion
AU - Chen, Sijia
AU - Song, Bin
AU - Guo, Jie
AU - Du, Xiaojiang
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017
Y1 - 2017
N2 - The emergence of big data bringsnewissues and challenges for the data imbalance problem.Therefore, unbalanced data sampling technology has been a hot research topic in the field of big data.However, the existing sampling methods cannot accurately define the harmful and useless samplescontained in the originaldataset. That is, based on the single information of the dataset, a large number of actuallyharmful samples are being used for sampling, which results in a sharp decline in the identifiable performance of the sampled data. In order to overcome the problems caused by only using one kind of information, an unbalanced data hybrid-sampling algorithm based on multi-information fusion(MIFS)is presented in this paper. The MIFS combines the feature information learned by the boostingmodel with the position information of the data to define the sample, and then divides the samples into different subsets by the information contained. According to the definition of samples, the algorithm performs corresponding under-sampling and over-sampling on these subsets. Experiments show that the MIFS method can improve the performance of sampling operations and produce a high F-score and AUC against bothminority and majority classes in the classification of balanced data.
AB - The emergence of big data bringsnewissues and challenges for the data imbalance problem.Therefore, unbalanced data sampling technology has been a hot research topic in the field of big data.However, the existing sampling methods cannot accurately define the harmful and useless samplescontained in the originaldataset. That is, based on the single information of the dataset, a large number of actuallyharmful samples are being used for sampling, which results in a sharp decline in the identifiable performance of the sampled data. In order to overcome the problems caused by only using one kind of information, an unbalanced data hybrid-sampling algorithm based on multi-information fusion(MIFS)is presented in this paper. The MIFS combines the feature information learned by the boostingmodel with the position information of the data to define the sample, and then divides the samples into different subsets by the information contained. According to the definition of samples, the algorithm performs corresponding under-sampling and over-sampling on these subsets. Experiments show that the MIFS method can improve the performance of sampling operations and produce a high F-score and AUC against bothminority and majority classes in the classification of balanced data.
UR - http://www.scopus.com/inward/record.url?scp=85046435259&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046435259&partnerID=8YFLogxK
U2 - 10.1109/GLOCOM.2017.8254481
DO - 10.1109/GLOCOM.2017.8254481
M3 - Conference article
AN - SCOPUS:85046435259
SN - 2334-0983
VL - 2018-January
SP - 1
EP - 7
JO - Proceedings - IEEE Global Communications Conference, GLOBECOM
JF - Proceedings - IEEE Global Communications Conference, GLOBECOM
T2 - 2017 IEEE Global Communications Conference, GLOBECOM 2017
Y2 - 4 December 2017 through 8 December 2017
ER -