Nearest Neighbor Distributions for imbalanced classification

Evan Kriminger, José C. Príncipe, Choudur Lakshminarayan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

26 Scopus citations

Abstract

The class imbalance problem is pervasive in machine learning. To accurately classify the minority class, current methods rely on sampling schemes to close the gap between classes, or on the application of error costs to create algorithms which favor the minority class. Since the sampling schemes and costs must be specified, these methods are highly dependent on the class distributions present in the training set. This makes them difficult to apply in settings where the level of imbalance changes, such as in online streaming data. Often they cannot handle multi-class problems. We present a novel single-class algorithm called Class Conditional Nearest Neighbor Distribution (CCNND), which mitigates the effects of class imbalance through local geometric structure in the data. Our algorithm can be applied seamlessly to problems with any level of imbalance or number of classes, and new examples are simply added to the training set. We show that it performs as well as or better than top sampling and cost- weighting methods on four imbalanced datasets from the UCI Machine Learning Repository, and then apply it to streaming data from the oil and gas industry alongside a modified nearest neighbor algorithm. Our algorithm's competitive performance relative to the state-of-the-art, coupled with its extremely simple implementation and automatic adjustment for minority classes, demonstrates that it is worth further study.

Original languageEnglish
Title of host publication2012 International Joint Conference on Neural Networks, IJCNN 2012
DOIs
StatePublished - 2012
Event2012 Annual International Joint Conference on Neural Networks, IJCNN 2012, Part of the 2012 IEEE World Congress on Computational Intelligence, WCCI 2012 - Brisbane, QLD, Australia
Duration: 10 Jun 201215 Jun 2012

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2012 Annual International Joint Conference on Neural Networks, IJCNN 2012, Part of the 2012 IEEE World Congress on Computational Intelligence, WCCI 2012
Country/TerritoryAustralia
CityBrisbane, QLD
Period10/06/1215/06/12

Fingerprint

Dive into the research topics of 'Nearest Neighbor Distributions for imbalanced classification'. Together they form a unique fingerprint.

Cite this