TY - GEN
T1 - Nearest Neighbor Distributions for imbalanced classification
AU - Kriminger, Evan
AU - Príncipe, José C.
AU - Lakshminarayan, Choudur
PY - 2012
Y1 - 2012
N2 - The class imbalance problem is pervasive in machine learning. To accurately classify the minority class, current methods rely on sampling schemes to close the gap between classes, or on the application of error costs to create algorithms which favor the minority class. Since the sampling schemes and costs must be specified, these methods are highly dependent on the class distributions present in the training set. This makes them difficult to apply in settings where the level of imbalance changes, such as in online streaming data. Often they cannot handle multi-class problems. We present a novel single-class algorithm called Class Conditional Nearest Neighbor Distribution (CCNND), which mitigates the effects of class imbalance through local geometric structure in the data. Our algorithm can be applied seamlessly to problems with any level of imbalance or number of classes, and new examples are simply added to the training set. We show that it performs as well as or better than top sampling and cost- weighting methods on four imbalanced datasets from the UCI Machine Learning Repository, and then apply it to streaming data from the oil and gas industry alongside a modified nearest neighbor algorithm. Our algorithm's competitive performance relative to the state-of-the-art, coupled with its extremely simple implementation and automatic adjustment for minority classes, demonstrates that it is worth further study.
AB - The class imbalance problem is pervasive in machine learning. To accurately classify the minority class, current methods rely on sampling schemes to close the gap between classes, or on the application of error costs to create algorithms which favor the minority class. Since the sampling schemes and costs must be specified, these methods are highly dependent on the class distributions present in the training set. This makes them difficult to apply in settings where the level of imbalance changes, such as in online streaming data. Often they cannot handle multi-class problems. We present a novel single-class algorithm called Class Conditional Nearest Neighbor Distribution (CCNND), which mitigates the effects of class imbalance through local geometric structure in the data. Our algorithm can be applied seamlessly to problems with any level of imbalance or number of classes, and new examples are simply added to the training set. We show that it performs as well as or better than top sampling and cost- weighting methods on four imbalanced datasets from the UCI Machine Learning Repository, and then apply it to streaming data from the oil and gas industry alongside a modified nearest neighbor algorithm. Our algorithm's competitive performance relative to the state-of-the-art, coupled with its extremely simple implementation and automatic adjustment for minority classes, demonstrates that it is worth further study.
UR - http://www.scopus.com/inward/record.url?scp=84865100421&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865100421&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2012.6252718
DO - 10.1109/IJCNN.2012.6252718
M3 - Conference contribution
AN - SCOPUS:84865100421
SN - 9781467314909
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2012 International Joint Conference on Neural Networks, IJCNN 2012
T2 - 2012 Annual International Joint Conference on Neural Networks, IJCNN 2012, Part of the 2012 IEEE World Congress on Computational Intelligence, WCCI 2012
Y2 - 10 June 2012 through 15 June 2012
ER -