Will Lewis

Microsoft Research, Machine Translation Group

Haitian Creole: How to Build and Ship an MT Engine from Scratch in 4 days, 17 hours, & 30 minutes

We relate the effort of the Microsoft Translator team to develop a Haitian Creole statistical machine translation engine from scratch in a matter of days. Haitian Creole presents a number of difficulties for developing an SMT system, principal among these is the lack of significant amounts of parallel training data and an inconsistent orthography, both of which lead to data sparseness. We demonstrate, however, that it is possible to build a translation engine of reasonable quality over very little data by engaging with the native language community and reducing data sparseness in creative ways. As such, we show that MT as a technology and as a service can be deployed rapidly in crisis situations.

Back to symposium main page