The AGGREGATION Project: Automatic Generation of Grammars for Endangered Languages from Glosses and Typological Information

Project Overview

Implemented grammars can contribute to endangered language documentation in several ways. In the first instance, the grammars themselves provide a very rich addition to prose descriptive grammars, allowing linguists to explore analyses at a level of precision not usually achieved in prose descriptions. Furthermore, implemented grammars can be used to create treebanks, that is, collections of utterances (from running text or elicited examples) associated with syntactic and semantic structures. The process of creating the treebank can provide important feedback to the field linguist about aspects of the linguistic data not covered by current analyses. The resulting treebanks can be used to create further computational tools and are also a rich source of comparable data for qualitative and quantitative work in typology, grounding higher level linguistic abstractions in actual utterances in a computationally tractable fashion. Despite these advantages, grammar engineering for language documentation has gone largely unexplored. In this project, we investigate how to automate the construction of grammar fragments, building on interlinear glossed text (IGT) and the LinGO Grammar Matrix, a typologically motivated cross-linguistic computational resource.

Publications & Presentations

Howell, Kristen and Emily M. Bender. 2022. Building Analyses from Syntactic Inference in Local Languages: An HPSG Grammar Inference System. The Northern European Journal of Language Technology (NEJLT) 8(1).

Howell, Kristen. 2020. Inferring Grammars from Interlinear Glossed Text: Extracting Typological and Lexical Properties for the Automatic Generation of HPSG Grammars. PhD thesis, University of Washington.

Bender, Emily M., Joshua Crowgey, Michael Wayne Goodman, Kristen Howell, Haley Lepp, Fei Xia, and Olga Zamaraeva. 2020. AGGREGATION: Building Computational Resources Automatically from IGT. Invited poster at Reflections on the Impact of DEL-funded Research Over Fifteen Years, LSA 2020, New Orleans, LA. January 3, 2020.

Lepp, Haley, Olga Zamaraeva and Emily M. Bender. 2019. Visualizing Inferred Morphotactic Systems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). pp.127-131. [.bib] [online demo]

Zamaraeva, Olga, Kristen Howell and Emily M. Bender. 2019. Handling Cross-cutting Properties in Automatic Inference of Lexical Classes: A Case Study of Chintang. Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages, Honolulu, HI. Honolulu, HI. pp.28-38. [.bib]

Zamaraeva, Olga, Emily M. Bender, Michael Wayne Goodman, Kristen Howell, and Fei Xia. Improving Toolbox IGT using the Xigt data model. Technology Showcase, 6th International Conference on Language Documentaiton and Conservation, Honolulu, HI.

Zamaraeva, Olga, Kristen Howell and Emily M. Bender. 2019. Modeling Clausal Complementation for a Grammar Engineering Resource. Proceedings of the Society for Computation in Linguistics Vol. 2, Article 6.

Howell, Kristen, Olga Zamaraeva, and Emily M. Bender. 2018. Nominalized Clauses in the Grammar Matrix. In Müller, Stefan and Frank Richter, eds. Proceedings of the 25th International Conference on Head-Driven Phrase Structure Grammar, pp.68-88. University of Tokyo.

Howell, Kristen and Zamaraeva, Olga. 2018. Clausal Modifiers in the Grammar Matrix. In Proceedings of COLING 2018, Santa Fe, NM.

Zamaraeva, Olga, Kristen Howell, and Emily M. Bender. 2018. A Cross-Linguistic Account of Subordinator and Subordinate Clause Position. Poster presented at HPSG 2018.

Zamaraeva, Olga, František Kratochvíl, Emily M. Bender, Fei Xia and Kristen Howell. 2017. Computational Support for Finding Word Classes: A Case Study of Abui. In Proceedings of ComputEL-2: 2nd Workshop on Computational Methods for Endangered Languages, ICLDC 2017, Honolulu Hawai`i.

Howell, Kristen, Emily M. Bender, Michael Lockwood, Fei Xia and Olga Zamaraeva. 2017. Inferring Case Systems from IGT: Impacts and Detection of Variable Glossing Practices. In Proceedings of ComputEL-2: 2nd Workshop on Computational Methods for Endangered Languages, ICLDC 2017, Honolulu Hawai`i. [.bib]

Xia, Fei, William D. Lewis, Michael Wayne Goodman, Glenn Slayden, Ryan Georgi, Joshua Crowgey and Emily M. Bender. 2016. Enriching a Massively Multilingual Database of Interlinear Glossed Text. Language Resources and Evaluation. 50(2):321-349.

Zamaraeva, Olga. 2016. Inferring Morphotactics from Interlinear Glossed Text: Combining Clustering and Precision Grammars. Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology.

Goodman, Michael Wayne, Joshua Crowgey, Fei Xia and Emily M. Bender. 2015. Xigt: Extensible Interlinear Glossed Text for Natural Language Processing. Language Resources and Evaluation 49(2):455-485.

Bender, Emily M., Joshua Crowgey, Michael Wayne Goodman and Fei Xia. 2014. Learning Grammar Specifications from IGT: A Case Study of Chintang. Proceedings of the Workshop on the Use of Computational Methods in the Study of Endangered Languages, ACL 2014, Baltimore, MD. [.bib]

Xia, Fei, William D. Lewis, Michael Wayne Goodman, Joshua Crowgey, and Emily M. Bender. 2014. Enriching ODIN. Proceedings of LREC 2014. [.bib]

Emily M. Bender. 2014. Language CoLLAGE: Grammatical Description with the LinGO Grammar Matrix Proceedings of LREC 2014. [.bib]

Wax, David. 2014. Automated Grammar Engineering for Verbal Morphology. MS thesis, University of Washington.

Bender, Emily M., Michael Wayne Goodman, Joshua Crowgey and Fei Xia. 2013. Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-Scale Typological Properties. In Proceedings of the ACL 2013 workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities. [.bib]

Bender, Emily M., Fei Xia, Joshua Crowgey and Michael Wayne Goodman. 2013. Towards Automatic Detection of Morphosyntactic Systems from IGT. Paper presented at the workshop Exploring Data from Language Documentation, ZAS Berlin, 10 May 2013. [Slides]

Bender, Emily M., Robert Schikowski, and Balthasar Bickel. 2012. Deriving a Lexicon for a Precision Grammar from Language Documentation Resources: A Case Study of Chintang. Proceedings of COLING 2012 [.bib]

Bender, Emily M., Sumukh Ghodke, Timothy Baldwin, and Rebecca Dridan. 2012. From Database to Treebank: Enhancing Hypertext Grammars with Grammar Engineering and Treebank Search. Nordhoff, Sebastian and Poggeman, Karl-Ludwig G., eds. Electronic Grammaticography. Honolulu: University of Hawaii Press. pp.179-206.

Bender, Emily M., David Wax, and Michael Wayne Goodman. 2012. From IGT to Precision Grammar: French Verbal Morphology. LSA Annual Meeting Extended Abstracts 2012

Software

Data

Project Members

Advisory Board

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grants No. BCS-1160274 and BCS-1561833. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.