Sentence segmentation using IBM word alignment model 1

Jia Xu, Richard Zens, Hermann Ney

Research output: Contribution to conferencePaperpeer-review

27 Scopus citations

Abstract

In statistical machine translation, word alignment models are trained on bilingual corpora. Long sentences pose severe problems: 1. the high computational requirements; 2. the poor quality of the resulting word alignment. We present a sentence-segmentation method that solves these problems by splitting long sentence pairs. Our approach uses the lexicon information to locate the optimal split point. This method is evaluated on two Chinese-English translation tasks in the news domain. We show that the segmentation of long sentences before training significantly improves the final translation quality of a state-of-the-art machine translation system. In one of the tasks, we achieve an improvement of the BLEU score of more than 20% relative.

Original languageEnglish
Pages280-287
Number of pages8
StatePublished - 2005
Event10th Annual Conference on European Association for Machine Translation, EAMT 2005 - Budapest, Hungary
Duration: 30 May 200531 May 2005

Conference

Conference10th Annual Conference on European Association for Machine Translation, EAMT 2005
Country/TerritoryHungary
CityBudapest
Period30/05/0531/05/05

Fingerprint

Dive into the research topics of 'Sentence segmentation using IBM word alignment model 1'. Together they form a unique fingerprint.

Cite this