Learning multilingual topics from incomparable corpora

Shudong Hao, Michael J. Paul

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

Multilingual topic models enable crosslingual tasks by extracting consistent topics from multilingual corpora. Most models require parallel or comparable training corpora, which limits their ability to generalize. In this paper, we first demystify the knowledge transfer mechanism behind multilingual topic models by defining an alternative but equivalent formulation. Based on this analysis, we then relax the assumption of training data required by most existing models, creating a model that only requires a dictionary for training. Experiments show that our new method effectively learns coherent multilingual topics from partially and fully incomparable corpora with limited amounts of dictionary resources.

Original languageEnglish
Title of host publicationCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
EditorsEmily M. Bender, Leon Derczynski, Pierre Isabelle
Pages2595-2609
Number of pages15
ISBN (Electronic)9781948087506
StatePublished - 2018
Event27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States
Duration: 20 Aug 201826 Aug 2018

Publication series

NameCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings

Conference

Conference27th International Conference on Computational Linguistics, COLING 2018
Country/TerritoryUnited States
CitySanta Fe
Period20/08/1826/08/18

Fingerprint

Dive into the research topics of 'Learning multilingual topics from incomparable corpora'. Together they form a unique fingerprint.

Cite this