Automatic Taxonomy Construction from Keywords via Scalable Bayesian Rose Trees

Yangqiu Song, Shixia Liu, Xueqing Liu, Haixun Wang

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

In this paper, we study a challenging problem of deriving a taxonomy from a set of keyword phrases. A solution can benefit many real-world applications because i) keywords give users the flexibility and ease to characterize a specific domain; and ii) in many applications, such as online advertisements, the domain of interest is already represented by a set of keywords. However, it is impossible to create a taxonomy out of a keyword set itself. We argue that additional knowledge and context are needed. To this end, we first use a general-purpose knowledgebase and keyword search to supply the required knowledge and context. Then, we develop a Bayesian approach to build a hierarchical taxonomy for a given set of keywords. We reduce the complexity of previous hierarchical clustering approaches from O(n2 log n) to O(n log n) using a nearest-neighbor-based approximation, so that we can derive a domain-specific taxonomy from one million keyword phrases in less than an hour. Finally, we conduct comprehensive large scale experiments to show the effectiveness and efficiency of our approach. A real life example of building an insurance-related web search query taxonomy illustrates the usefulness of our approach for specific domains.

Original languageEnglish
Article number7029112
Pages (from-to)1861-1874
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume27
Issue number7
DOIs
StatePublished - 1 Jul 2015

Keywords

  • Bayesian Rose Tree
  • Hierarchical Clustering
  • Keyword Taxonomy Building
  • Short Text Conceptualization

Fingerprint

Dive into the research topics of 'Automatic Taxonomy Construction from Keywords via Scalable Bayesian Rose Trees'. Together they form a unique fingerprint.

Cite this