TY - JOUR
T1 - The threshold algorithm
T2 - From middleware systems to the relational engine
AU - Bruno, Nicolas
AU - Wang, Hui
PY - 2007/4
Y1 - 2007/4
N2 - The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the threshold algorithm (TA) is the most well-known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent not only over middleware systems, but also over plain relational data. In this work, we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indices, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work, we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems
AB - The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the threshold algorithm (TA) is the most well-known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent not only over middleware systems, but also over plain relational data. In this work, we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indices, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work, we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems
KW - Query processing
KW - Relational databases
KW - Retrieval models
KW - Search process
UR - http://www.scopus.com/inward/record.url?scp=50249153108&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=50249153108&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2007.1011
DO - 10.1109/TKDE.2007.1011
M3 - Article
AN - SCOPUS:50249153108
SN - 1041-4347
VL - 19
SP - 523
EP - 537
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 4
M1 - 4118709
ER -