TY - JOUR
T1 - Direction-Projection-Permutation for High-Dimensional Hypothesis Tests
AU - Wei, Susan
AU - Lee, Chihoon
AU - Wichers, Lindsay
AU - Marron, J. S.
N1 - Publisher Copyright:
© 2016 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
PY - 2016/4/2
Y1 - 2016/4/2
N2 - High-dimensional low sample size (HDLSS) data are becoming increasingly common in statistical applications. When the data can be partitioned into two classes, a basic task is to construct a classifier that can assign objects to the correct class. Binary linear classifiers have been shown to be especially useful in HDLSS settings and preferable to more complicated classifiers because of their ease of interpretability. We propose a computational tool called direction-projection-permutation (DiProPerm), which rigorously assesses whether a binary linear classifier is detecting statistically significant differences between two high-dimensional distributions. The basic idea behind DiProPerm involves working directly with the one-dimensional projections of the data induced by binary linear classifier. Theoretical properties of DiProPerm are studied under the HDLSS asymptotic regime whereby dimension diverges to infinity while sample size remains fixed. We show that certain variations of DiProPerm are consistent and that consistency is a nontrivial property of tests in the HDLSS asymptotic regime. The practical utility of DiProPerm is demonstrated on HDLSS gene expression microarray datasets. Finally, an empirical power study is conducted comparing DiProPerm to several alternative two-sample HDLSS tests to understand the advantages and disadvantages of each method.
AB - High-dimensional low sample size (HDLSS) data are becoming increasingly common in statistical applications. When the data can be partitioned into two classes, a basic task is to construct a classifier that can assign objects to the correct class. Binary linear classifiers have been shown to be especially useful in HDLSS settings and preferable to more complicated classifiers because of their ease of interpretability. We propose a computational tool called direction-projection-permutation (DiProPerm), which rigorously assesses whether a binary linear classifier is detecting statistically significant differences between two high-dimensional distributions. The basic idea behind DiProPerm involves working directly with the one-dimensional projections of the data induced by binary linear classifier. Theoretical properties of DiProPerm are studied under the HDLSS asymptotic regime whereby dimension diverges to infinity while sample size remains fixed. We show that certain variations of DiProPerm are consistent and that consistency is a nontrivial property of tests in the HDLSS asymptotic regime. The practical utility of DiProPerm is demonstrated on HDLSS gene expression microarray datasets. Finally, an empirical power study is conducted comparing DiProPerm to several alternative two-sample HDLSS tests to understand the advantages and disadvantages of each method.
KW - Distance weighted discrimination; High-dimensional hypothesis test; High-dimensional low sample size; Linear binary classification; Permutation test; Two-sample problem
UR - http://www.scopus.com/inward/record.url?scp=84971389584&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84971389584&partnerID=8YFLogxK
U2 - 10.1080/10618600.2015.1027773
DO - 10.1080/10618600.2015.1027773
M3 - Article
AN - SCOPUS:84971389584
SN - 1061-8600
VL - 25
SP - 549
EP - 569
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 2
ER -