Abstract
In statistical machine translation, word alignment models are trained on bilingual corpora. Long sentences pose severe problems: 1. the high computational requirements; 2. the poor quality of the resulting word alignment. We present a sentence-segmentation method that solves these problems by splitting long sentence pairs. Our approach uses the lexicon information to locate the optimal split point. This method is evaluated on two Chinese-English translation tasks in the news domain. We show that the segmentation of long sentences before training significantly improves the final translation quality of a state-of-the-art machine translation system. In one of the tasks, we achieve an improvement of the BLEU score of more than 20% relative.
Original language | English |
---|---|
Pages | 280-287 |
Number of pages | 8 |
State | Published - 2005 |
Event | 10th Annual Conference on European Association for Machine Translation, EAMT 2005 - Budapest, Hungary Duration: 30 May 2005 → 31 May 2005 |
Conference
Conference | 10th Annual Conference on European Association for Machine Translation, EAMT 2005 |
---|---|
Country/Territory | Hungary |
City | Budapest |
Period | 30/05/05 → 31/05/05 |