TY - JOUR
T1 - Efficient active learning of sparse halfspaces with arbitrary bounded noise
AU - Zhang, Chicheng
AU - Shen, Jie
AU - Awasthi, Pranjal
N1 - Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - We study active learning of homogeneous s-sparse halfspaces in Rd under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most ? for a parameter ? 2 [0, 12), known as the bounded noise. Even in the presence of mild label noise, i.e. ? is a small constant, this is a challenging problem and only recently have label complexity bounds of the form Õ (s · polylog (d, 1e)) been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity Õ((s lned)2poly(1/(1-2?))), which is label-efficient only when the noise rate ? is a fixed constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of ssparse halfspaces, with a label complexity of Õ((1-s2?)4 polylog (d, 1e)). This is the first efficient algorithm with label complexity polynomial in 1-12? in this setting, which is label-efficient even for ? arbitrarily close to 12 . Our active learning algorithm and its theoretical guarantees also immediately translate to new state-of-the-art label and sample complexity results for full-dimensional active and passive halfspace learning under arbitrary bounded noise.
AB - We study active learning of homogeneous s-sparse halfspaces in Rd under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most ? for a parameter ? 2 [0, 12), known as the bounded noise. Even in the presence of mild label noise, i.e. ? is a small constant, this is a challenging problem and only recently have label complexity bounds of the form Õ (s · polylog (d, 1e)) been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity Õ((s lned)2poly(1/(1-2?))), which is label-efficient only when the noise rate ? is a fixed constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of ssparse halfspaces, with a label complexity of Õ((1-s2?)4 polylog (d, 1e)). This is the first efficient algorithm with label complexity polynomial in 1-12? in this setting, which is label-efficient even for ? arbitrarily close to 12 . Our active learning algorithm and its theoretical guarantees also immediately translate to new state-of-the-art label and sample complexity results for full-dimensional active and passive halfspace learning under arbitrary bounded noise.
UR - http://www.scopus.com/inward/record.url?scp=85107890809&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107890809&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85107890809
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Y2 - 6 December 2020 through 12 December 2020
ER -