Amittai Axelrod

UW EE / MSR NLP Group

Panlingual Lexical Translation

We'll present a method for performing domain adaptation of machine
translation systems. From the machine translation perspective, more
data is (almost) always better. Because we use all available parallel
data to improve our statistical models, the constructed systems have
relatively broad topical and lexical coverage. This is good in
general, but less so when when translating within specific domains
with particular styles or jargon. It is possible for a system trained
on 30,000 in-domain sentences can outperform a general system trained
on 12 million sentences. Given both the in-domain and general-domain
data, how can these two systems be combined to improve in-domain
performance?


Back to symposium main page