Tony Fader

UW CSE / MSR Machine Learning Group

Long Query Understanding with Wikipedia

Web search engines suffer from the long query problem: performance is
generally lower for queries containing many terms. Long queries often
contain extraneous terms that can mislead the search engine. Further,
long queries are rare, so implicit feedback from user interaction
logs is not available. Previous work on the long query problem has
focused on reduction, where a subset of terms is dropped with the
hope of improving retrieval performance. In this work, we propose an
alternative framework for long query rewriting that leverages
Wikipedia as background knowledge. Our framework involves mapping
long queries to Wikipedia entities, using session logs as a source of
context, and then choosing a subset of entities that best captures
the query's meaning. We provide preliminary results showing that this
method improves performance on a random sample of long queries.

Back to symposium main page