TY - GEN
T1 - Analyzing Bayesian crosslingual transfer in topic models
AU - Hao, Shudong
AU - Paul, Michael J.
N1 - Publisher Copyright:
© 2019 Association for Computational Linguistics
PY - 2019
Y1 - 2019
N2 - We introduce a theoretical analysis of crosslingual transfer in probabilistic topic models. By formulating posterior inference through Gibbs sampling as a process of language transfer, we propose a new measure that quantifies the loss of knowledge across languages during this process. This measure enables us to derive a PAC-Bayesian bound that elucidates the factors affecting model quality, both during training and in downstream applications. We provide experimental validation of the analysis on a diverse set of five languages, and discuss best practices for data collection and model design based on our analysis.
AB - We introduce a theoretical analysis of crosslingual transfer in probabilistic topic models. By formulating posterior inference through Gibbs sampling as a process of language transfer, we propose a new measure that quantifies the loss of knowledge across languages during this process. This measure enables us to derive a PAC-Bayesian bound that elucidates the factors affecting model quality, both during training and in downstream applications. We provide experimental validation of the analysis on a diverse set of five languages, and discuss best practices for data collection and model design based on our analysis.
UR - http://www.scopus.com/inward/record.url?scp=85083077042&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083077042&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85083077042
T3 - NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
SP - 1551
EP - 1565
BT - Long and Short Papers
T2 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019
Y2 - 2 June 2019 through 7 June 2019
ER -