TY - GEN
T1 - Provable variable selection for streaming features
AU - Wang, Jing
AU - Shen, Jie
AU - Li, Ping
N1 - Publisher Copyright:
© 35th International Conference on Machine Learning, ICML 2018.All Rights Reserved.
PY - 2018
Y1 - 2018
N2 - In large-scale machine learning applications and high-dimensional statistics, it is ubiquitous to address a considerable number of features among which many are redundant. As a remedy, on-. line feature selection has attracted increasing attention in recent years. It sequentially reveals features and evaluates the importance of them. Though online feature selection has proven an elegant methodology, it is usually challenging to carry out a rigorous theoretical characterization. In this work, we propose a provable online feature selection algorithm that utilizes the online leverage score. The selected features are then fed to κ-means clustering, making the clustering step memory and computationally efficient. We prove that with high probability, performing κ-means clustering based on the selected feature space does not deviate far from the optimal clustering using the original data. The empirical results on realworld data sets demonstrate the effectiveness of our algorithm.
AB - In large-scale machine learning applications and high-dimensional statistics, it is ubiquitous to address a considerable number of features among which many are redundant. As a remedy, on-. line feature selection has attracted increasing attention in recent years. It sequentially reveals features and evaluates the importance of them. Though online feature selection has proven an elegant methodology, it is usually challenging to carry out a rigorous theoretical characterization. In this work, we propose a provable online feature selection algorithm that utilizes the online leverage score. The selected features are then fed to κ-means clustering, making the clustering step memory and computationally efficient. We prove that with high probability, performing κ-means clustering based on the selected feature space does not deviate far from the optimal clustering using the original data. The empirical results on realworld data sets demonstrate the effectiveness of our algorithm.
UR - http://www.scopus.com/inward/record.url?scp=85057327851&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057327851&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057327851
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 8220
EP - 8228
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -