Hany Hassan Awadallah

MSR

Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Statistical phrase-based translation learns translation rules from bilingual corpora, and has traditionally only used monolingual evidence to construct features that rescore existing translation candidates.

In this work, we present a semi-supervised graph-based approach for generating new translation rules that leverages bilingual and monolingual data. We report results on a large Arabic-English system and a medium-sized Urdu-English system. Our proposed approach significantly improves the performance of competitive phrase- based systems, leading to consistent improvements between 1 and 4 BLEU points on standard evaluation sets.


Back to symposium main page