Marcus Sammer

Joint Work with Oren Etzioni, Kobi Reiter, and Stephen Soderland

UW, Turing Center

Panlingual Lexical Translation

UW/Microsoft Symposium, 04/20/07

Lexical Translation is the task of translating individual words or phrases. Lexical translation is useful to support applications such as cross-lingual search, the translation of meta-data, knowledge-based translation, and more. The Turing Center has two lexical translation projects that aim to scale lexical translation to a very large number of language pairs. The PanImages project has built a cross-lingual image search engine for the Web. Lexical translation occurs in PanImages via the translation graph, a massive lexical resource where each node denotes a word in some language and each edge denotes a word sense shared by a pair of words. The graph is automatically constructed from machine-readable dictionaries and Wiktionaries.

The PanLexicon project uses bilingual lexicons and contexts of words from monolingual corpora to guide it in finding translation sets - sets of words that share the same word sense across multiple languages. By maintaining word sense distinctions, PanLexicon finds translations between language pairs that are not supported by any of its bilingual source dictionaries.

My talk will describe the complementary approaches taken by both projects and report on our empirical results.

