Computational Linguistics as a Manx Cat: The Long and Winding Road of Deep Lexical Acquisition

Jeremy Nicholson U. Melbourne

In this talk, we detail deep lexical acquisition: the process of automatically learning linguistic structures for use in linguistically-rich lexical resources. We examine a series of experiments involving leveraging available resources and the World Wide Web to predict a number of lexical properties across a number of languages. As with many tasks in natural language processing, the token-type interface biases evaluation toward the most dense tokens (across the most dense languages!), with a long, unloved tail. Finally, we will discuss the implications when parsing using deep grammars.

