TY - JOUR
T1 - Fast spectral analysis for approximate nearest neighbor search
AU - Wang, Jing
AU - Shen, Jie
N1 - Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature.
PY - 2022/6
Y1 - 2022/6
N2 - In large-scale machine learning, of central interest is the problem of approximate nearest neighbor (ANN) search, where the goal is to query particular points that are close to a given object under certain metric. In this paper, we develop a novel data-driven ANN search algorithm where the data structure is learned by fast spectral technique based on s landmarks selected by approximate ridge leverage scores. We show that with overwhelming probability, our algorithm returns the (1 + ϵ/ 4) -ANN for any approximation parameter ϵ∈ (0 , 1). A remarkable feature of our algorithm is that it is computationally efficient. Specifically, learning k-length hash codes requires O((s3+ ns2) log n) running time and O(d2) extra space, and returning the (1 + ϵ/ 4) -ANN of the query needs O(klog n) running time. The experimental results on computer vision and natural language understanding tasks demonstrate the significant advantage of our algorithm compared to state-of-the-art methods.
AB - In large-scale machine learning, of central interest is the problem of approximate nearest neighbor (ANN) search, where the goal is to query particular points that are close to a given object under certain metric. In this paper, we develop a novel data-driven ANN search algorithm where the data structure is learned by fast spectral technique based on s landmarks selected by approximate ridge leverage scores. We show that with overwhelming probability, our algorithm returns the (1 + ϵ/ 4) -ANN for any approximation parameter ϵ∈ (0 , 1). A remarkable feature of our algorithm is that it is computationally efficient. Specifically, learning k-length hash codes requires O((s3+ ns2) log n) running time and O(d2) extra space, and returning the (1 + ϵ/ 4) -ANN of the query needs O(klog n) running time. The experimental results on computer vision and natural language understanding tasks demonstrate the significant advantage of our algorithm compared to state-of-the-art methods.
KW - Approximate nearest neighbor search
KW - Hashing
KW - Noise
KW - Spectral analysis
KW - Subspace
UR - http://www.scopus.com/inward/record.url?scp=85122380449&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122380449&partnerID=8YFLogxK
U2 - 10.1007/s10994-021-06124-1
DO - 10.1007/s10994-021-06124-1
M3 - Article
AN - SCOPUS:85122380449
SN - 0885-6125
VL - 111
SP - 2297
EP - 2322
JO - Machine Learning
JF - Machine Learning
IS - 6
ER -