Building Ontologies from Collaborative Knowledge Bases to Search and Interpret Multilingual Corpora

Yegin Genc, Elizabeth A. Lennon, Winter Mason, Jeffrey V. Nickerson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Tools and techniques that automate the interpretation of multilingual corpora are useful on many fronts; scholars, as an example, could use such tools to more readily pinpoint relevant articles from journals in a wide variety of languages. This work describes techniques to build and characterize ontologies using collaborative knowledge bases, e.g., Wikipedia. These ontologies can then be used to search and classify texts. Originally developed for monolingual corpora, we extend the approach to multilingual texts and test the methods with Mandarin scientific abstracts. The presented techniques provide a novel and efficient mechanism to obtain contextually rich ontologies and measure document relevancy within multilingual corpora.

Original languageEnglish
Title of host publication6th Workshop on Building and Using Comparable Corpora, BUCC 2013 at the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 - Proceedings
EditorsSerge Sharoff, Pierre Zweigenbaum, Reinhard Rapp, Reinhard Rapp
Pages87-94
Number of pages8
ISBN (Electronic)9781937284602
StatePublished - 2013
Event6th Workshop on Building and Using Comparable Corpora, BUCC 2013 at the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 - Sofia, Bulgaria
Duration: 8 Aug 2013 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference6th Workshop on Building and Using Comparable Corpora, BUCC 2013 at the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013
Country/TerritoryBulgaria
CitySofia
Period8/08/13 → …

Fingerprint

Dive into the research topics of 'Building Ontologies from Collaborative Knowledge Bases to Search and Interpret Multilingual Corpora'. Together they form a unique fingerprint.

Cite this