Hisami Suzuki and Kristina Toutanova

Microsoft

Generating Morphologically Rich Languages in MT

UW/Microsoft Symposium, 2/16/07

We study the use of rich syntax-based statistical models in MT for generating target languages that are morphologically complex. We first describe the statistical framework for incorporating both lexical and morphological/syntactic information from the source and target languages in the treelet-based MT system (Quirk et al., 2005). We then report experimental results of applying such models for three typologically divergent target languages (in all cases, the source language is English): Japanese, Russian and Arabic. Though some experiments are still to be conducted, the results obtained so far strongly suggest that the use of such specialized models in MT significantly improves target language generation, according to both the BLEU measure and human evaluation.

Back to symposium main page