The AGGREGATION Project: Automatic Generation of Grammars for Endangered Languages from Glosses and Typological Information

Project Overview

Implemented grammars can contribute to endangered language documentation in several ways. In the first instance, the grammars themselves provide a very rich addition to prose descriptive grammars, allowing linguists to explore analyses at a level of precision not usually achieved in prose descriptions. Furthermore, implemented grammars can be used to create treebanks, that is, collections of utterances (from running text or elicited examples) associated with syntactic and semantic structures. The process of creating the treebank can provide important feedback to the field linguist about aspects of the linguistic data not covered by current analyses. The resulting treebanks can be used to create further computational tools and are also a rich source of comparable data for qualitative and quantitative work in typology, grounding higher level linguistic abstractions in actual utterances in a computationally tractable fashion. Despite these advantages, grammar engineering for language documentation has gone largely unexplored. In this project, we investigate how to automate the construction of grammar fragments, building on interlinear glossed text (IGT) and the LinGO Grammar Matrix, a typologically motivated cross-linguistic computational resource.

Publications & Presentations

Zamaraeva, Olga, František Kratochvíl, Emily M. Bender, Fei Xia and Kristen Howell. 2017. Computational Support for Finding Word Classes: A Case Study of Abui. In Proceedings of ComputEL-2: 2nd Workshop on Computational Methods for Endangered Languages, ICLDC 2017, Honolulu Hawai`i.

Howell, Kristen, Emily M. Bender, Michael Lockwood, Fei Xia and Olga Zamaraeva. 2017. Inferring Case Systems from IGT: Impacts and Detection of Variable Glossing Practices. In Proceedings of ComputEL-2: 2nd Workshop on Computational Methods for Endangered Languages, ICLDC 2017, Honolulu Hawai`i.

Xia, Fei, William D. Lewis, Michael Wayne Goodman, Glenn Slayden, Ryan Georgi, Joshua Crowgey and Emily M. Bender. 2016. Enriching a Massively Multilingual Database of Interlinear Glossed Text. Language Resources and Evaluation. 50(2):321-349.

Goodman, Michael Wayne, Joshua Crowgey, Fei Xia and Emily M. Bender. 2015. Xigt: Extensible Interlinear Glossed Text for Natural Language Processing. Language Resources and Evaluation 49(2):455-485.

Bender, Emily M., Joshua Crowgey, Michael Wayne Goodman and Fei Xia. 2014. Learning Grammar Specifications from IGT: A Case Study of Chintang. Proceedings of the Workshop on the Use of Computational Methods in the Study of Endangered Languages, ACL 2014, Baltimore, MD. [.bib]

Xia, Fei, William D. Lewis, Michael Wayne Goodman, Joshua Crowgey, and Emily M. Bender. 2014. Enriching ODIN. Proceedings of LREC 2014. [.bib]

Emily M. Bender. 2014. Language CoLLAGE: Grammatical Description with the LinGO Grammar Matrix Proceedings of LREC 2014. [.bib]

Wax, David. 2014. Automated Grammar Engineering for Verbal Morphology. MS thesis, University of Washington.

Bender, Emily M., Michael Wayne Goodman, Joshua Crowgey and Fei Xia. 2013. Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-Scale Typological Properties. In Proceedings of the ACL 2013 workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities. [.bib]

Bender, Emily M., Fei Xia, Joshua Crowgey and Michael Wayne Goodman. 2013. Towards Automatic Detection of Morphosyntactic Systems from IGT. Paper presented at the workshop Exploring Data from Language Documentation, ZAS Berlin, 10 May 2013. [Slides]

Bender, Emily M., Robert Schikowski, and Balthasar Bickel. 2012. Deriving a Lexicon for a Precision Grammar from Language Documentation Resources: A Case Study of Chintang. Proceedings of COLING 2012 [.bib]

Bender, Emily M., Sumukh Ghodke, Timothy Baldwin, and Rebecca Dridan. 2012. From Database to Treebank: Enhancing Hypertext Grammars with Grammar Engineering and Treebank Search. Nordhoff, Sebastian and Poggeman, Karl-Ludwig G., eds. Electronic Grammaticography. Honolulu: University of Hawaii Press. pp.179-206.

Bender, Emily M., David Wax, and Michael Wayne Goodman. 2012. From IGT to Precision Grammar: French Verbal Morphology. LSA Annual Meeting Extended Abstracts 2012



Project Members

Advisory Board


This material is based upon work supported by the National Science Foundation under Grants No. BCS-1160274 and BCS-1561833. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.