Fei Xia, William Lewis, and Dan Jinguji

UW Linguistics

Towards automatic enrichment and analysis of linguistic data for low-density languages

UW/Microsoft Symposium, 10/20/06

The availability of language-specific computational tools such as parsers can greatly benefit linguistic research, but is highly dependent on the availability of significant quantities of hand annotated data. In this work we explore whether using a resource created for another purpose and spanning hundreds of the world's languages can be used instead. We draw inspiration from Yarowsky and Ngai (2001), who tested the methods for projecting linguistic annotations from one language to another. We seek to extend their methodology to a broad set of languages by manipulating a database of interlinearized language examples and projecting phrase and dependency structures from automatically parsed English data onto aligned source language data. Our methods have been successfully applied to a small set of languages, namely, Chamorro, German, Hausa, Irish, Korean, Malagasy, Welsh, and Yaqui, and results are encouraging.

