TY - JOUR
T1 - Semisupervised Feature Selection Based on Relevance and Redundancy Criteria
AU - Xu, Jin
AU - Tang, Bo
AU - He, Haibo
AU - Man, Hong
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2017/9
Y1 - 2017/9
N2 - Feature selection aims to gain relevant features for improved classification performance and remove redundant features for reduced computational cost. How to balance these two factors is a problem especially when the categorical labels are costly to obtain. In this paper, we address this problem using semisupervised learning method and propose a max-relevance and min-redundancy criterion based on Pearson's correlation (RRPC) coefficient. This new method uses the incremental search technique to select optimal feature subsets. The new selected features have strong relevance to the labels in supervised manner, and avoid redundancy to the selected feature subsets under unsupervised constraints. Comparative studies are performed on binary data and multicategory data from benchmark data sets. The results show that the RRPC can achieve a good balance between relevance and redundancy in semisupervised feature selection. We also compare the RRPC with classic supervised feature selection criteria (such as mRMR and Fisher score), unsupervised feature selection criteria (such as Laplacian score), and semisupervised feature selection criteria (such as sSelect and locality sensitive). Experimental results demonstrate the effectiveness of our method.
AB - Feature selection aims to gain relevant features for improved classification performance and remove redundant features for reduced computational cost. How to balance these two factors is a problem especially when the categorical labels are costly to obtain. In this paper, we address this problem using semisupervised learning method and propose a max-relevance and min-redundancy criterion based on Pearson's correlation (RRPC) coefficient. This new method uses the incremental search technique to select optimal feature subsets. The new selected features have strong relevance to the labels in supervised manner, and avoid redundancy to the selected feature subsets under unsupervised constraints. Comparative studies are performed on binary data and multicategory data from benchmark data sets. The results show that the RRPC can achieve a good balance between relevance and redundancy in semisupervised feature selection. We also compare the RRPC with classic supervised feature selection criteria (such as mRMR and Fisher score), unsupervised feature selection criteria (such as Laplacian score), and semisupervised feature selection criteria (such as sSelect and locality sensitive). Experimental results demonstrate the effectiveness of our method.
KW - Feature selection
KW - Pearson correlation coefficients
KW - machine learning
KW - max-relevance
KW - min-redundancy
KW - semisupervised learning
UR - http://www.scopus.com/inward/record.url?scp=84971405858&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84971405858&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2016.2562670
DO - 10.1109/TNNLS.2016.2562670
M3 - Article
C2 - 28113443
AN - SCOPUS:84971405858
SN - 2162-237X
VL - 28
SP - 1974
EP - 1984
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 9
M1 - 7475902
ER -