Query translation is a crucial step in cross-language information retrieval (CLIR), which aims to identify documents written in a different language. Machine translation (MT) can be a convenient tool for it, but we argue that it is not sufficient. In fact, the goal of query translation is different from that of MT: query translation aims to identify all the possible expressions of the original query in the target language, and those expressions are not limited to strict translations. By developing specific approaches to query translation, we can obtain better CLIR effectiveness. To show an example, we describe a study on query translation using statistical translation models trained on parallel web pages automatically mined from the web. Our experiments show that queries translated in this way produce better retrieval results than with a high quality MT system.
Back to symposium main page