TY - JOUR
T1 - Automatic Taxonomy Construction from Keywords via Scalable Bayesian Rose Trees
AU - Song, Yangqiu
AU - Liu, Shixia
AU - Liu, Xueqing
AU - Wang, Haixun
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2015/7/1
Y1 - 2015/7/1
N2 - In this paper, we study a challenging problem of deriving a taxonomy from a set of keyword phrases. A solution can benefit many real-world applications because i) keywords give users the flexibility and ease to characterize a specific domain; and ii) in many applications, such as online advertisements, the domain of interest is already represented by a set of keywords. However, it is impossible to create a taxonomy out of a keyword set itself. We argue that additional knowledge and context are needed. To this end, we first use a general-purpose knowledgebase and keyword search to supply the required knowledge and context. Then, we develop a Bayesian approach to build a hierarchical taxonomy for a given set of keywords. We reduce the complexity of previous hierarchical clustering approaches from O(n2 log n) to O(n log n) using a nearest-neighbor-based approximation, so that we can derive a domain-specific taxonomy from one million keyword phrases in less than an hour. Finally, we conduct comprehensive large scale experiments to show the effectiveness and efficiency of our approach. A real life example of building an insurance-related web search query taxonomy illustrates the usefulness of our approach for specific domains.
AB - In this paper, we study a challenging problem of deriving a taxonomy from a set of keyword phrases. A solution can benefit many real-world applications because i) keywords give users the flexibility and ease to characterize a specific domain; and ii) in many applications, such as online advertisements, the domain of interest is already represented by a set of keywords. However, it is impossible to create a taxonomy out of a keyword set itself. We argue that additional knowledge and context are needed. To this end, we first use a general-purpose knowledgebase and keyword search to supply the required knowledge and context. Then, we develop a Bayesian approach to build a hierarchical taxonomy for a given set of keywords. We reduce the complexity of previous hierarchical clustering approaches from O(n2 log n) to O(n log n) using a nearest-neighbor-based approximation, so that we can derive a domain-specific taxonomy from one million keyword phrases in less than an hour. Finally, we conduct comprehensive large scale experiments to show the effectiveness and efficiency of our approach. A real life example of building an insurance-related web search query taxonomy illustrates the usefulness of our approach for specific domains.
KW - Bayesian Rose Tree
KW - Hierarchical Clustering
KW - Keyword Taxonomy Building
KW - Short Text Conceptualization
UR - http://www.scopus.com/inward/record.url?scp=84959507482&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959507482&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2015.2397432
DO - 10.1109/TKDE.2015.2397432
M3 - Article
AN - SCOPUS:84959507482
SN - 1041-4347
VL - 27
SP - 1861
EP - 1874
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 7
M1 - 7029112
ER -