Amittai Axelrod

Topic Modeling for Statistical Machine Translation

Unsupervised topic models can be used effectively in language modeling and information retrieval to tailor performance on broad corpora by determining clusters of related data. We combine such a topic model based on Latent Dirichlet Allocation with our recent work on corpus sub-selection to improve machine translation system results on a variety of TED talks.

