An Unbalanced Data Hybrid-Sampling Algorithm Based on Multi-Information Fusion

Sijia Chen, Bin Song, Jie Guo, Xiaojiang Du

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

The emergence of big data bringsnewissues and challenges for the data imbalance problem.Therefore, unbalanced data sampling technology has been a hot research topic in the field of big data.However, the existing sampling methods cannot accurately define the harmful and useless samplescontained in the originaldataset. That is, based on the single information of the dataset, a large number of actuallyharmful samples are being used for sampling, which results in a sharp decline in the identifiable performance of the sampled data. In order to overcome the problems caused by only using one kind of information, an unbalanced data hybrid-sampling algorithm based on multi-information fusion(MIFS)is presented in this paper. The MIFS combines the feature information learned by the boostingmodel with the position information of the data to define the sample, and then divides the samples into different subsets by the information contained. According to the definition of samples, the algorithm performs corresponding under-sampling and over-sampling on these subsets. Experiments show that the MIFS method can improve the performance of sampling operations and produce a high F-score and AUC against bothminority and majority classes in the classification of balanced data.

Original languageEnglish
Pages (from-to)1-7
Number of pages7
JournalProceedings - IEEE Global Communications Conference, GLOBECOM
Volume2018-January
DOIs
StatePublished - 2017
Event2017 IEEE Global Communications Conference, GLOBECOM 2017 - Singapore, Singapore
Duration: 4 Dec 20178 Dec 2017

Fingerprint

Dive into the research topics of 'An Unbalanced Data Hybrid-Sampling Algorithm Based on Multi-Information Fusion'. Together they form a unique fingerprint.

Cite this