The purpose of this workshop is to bring together field linguists and others engaged in research on endangered languages with researchers in speech and natural language processing. Our goal is to identify candidate Shared Tasks in Speech and NLP which both pose exciting technical challenges and have the potential to yield valuable enabling technologies for those working directly with endangered languages.
The combined efforts of field linguists will not be sufficient to document the thousands of languages that are expected to disappear by the end of this century. Creating a lasting record of these disappearing languages is essential since all languages provide clues about how humans learn and process language. Language description and analysis could be facilitated by deploying computational analytic techniques on existing and ever growing repositories of digital audio and video language data.
This project will organize a workshop bringing together researchers in speech and language technology, researchers on endangered languages, field linguists, and language repository archivists to help drive the development of this computational technology. The workshop participants will discuss how Shared Task Evaluation Campaigns (STECs) can promote the development of speech and language technology for documenting endangered and low-resourced languages. Designed to enable comparability across systems and leverage the costs of data preparation, Shared Task Evaluation Campaigns operationalize research problems as tasks that are open to participation by researchers with uniform data, task specifications, evaluation metrics, and open reporting of results and system methodologies.Experience with STECs for speech and language technologies has shown that they have helped standardize shared data, create useful evaluation tools, and provide venues for publishing, comparing, and discussing results. Participants will evaluate both the language archive data available for computational processing and the types of STECs that should be undertaken. The results of the workshop can have long term impacts through acceleration of endangered language documentation, enhancement of digital language archives, and development of novel computational techniques for low-resource languages.