Michael J. Cafarella & Oren Etzioni


A Search Engine for Natural Language Applications

UW/Microsoft Symposium, 4/07/06

Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search engines are designed and optimized for simple human queries---they are not well suited to support such applications. As a result, these applications are forced to issue millions of successive queries resulting in unnecessary search engine load and in slow applications with limited scalability.

In response, we introduce the Bindings Engine (BE), which supports queries containing typed variables and string-processing functions. For example, in response to the query ``powerful <noun>" BE will return all the nouns in its index that immediately follow the word ``powerful'', sorted by frequency. In response to the query ``Cities such as ProperNoun(Head(<NounPhrase>))'', BE will return a list of proper nouns likely to be city names.

BE's novel neighborhood index enables it to do so with O(k) random disk seeks and O(k) serial disk reads, where k is the number of non-variable terms in its query. As a result, BE can yield several orders of magnitude speedup for large-scale language-processing applications. The main cost is a modest increase in space to store the index.

Back to symposium main page