Colin Cherry

Microsoft Research - NLP Group

Cohesive Phrase-based Decoding for Statistical Machine Translation

UW/Microsoft Symposium, 02/15/08

Phrase-based decoding produces state-of-the-art translations with no regard for syntax. We add syntax to this process with a cohesion constraint based on a dependency tree for the source sentence. The constraint allows the decoder to employ arbitrary, non-syntactic phrases, but ensures that those phrases are translated in an order that respects the source tree’s structure. In this way, we target the phrasal decoder’s weakness in order modeling, without affecting its strengths. To further increase flexibility, we incorporate cohesion as a decoder feature, creating a soft constraint. The resulting cohesive, phrase-based decoder is shown to produce translations that are preferred over non-cohesive output in both automatic and human evaluations.

Colin Cherry is a new researcher at MSR's NLP group, working primarily in machine translation. He received his PhD from the University of Alberta in 2007, studying under Dekang Lin. The above talk is drawn from the final chapter of his thesis work.

Back to symposium main page