Transcription and Coding Tools

Tools for automated orthographic transcription, phone-level forced alignment, annotation and metadata tagging.

Filename

Author

Description

Alicia Wassink
Rob Squizzero
Campion Fellin
David Nichols
This tool serves as a web interface for automatically transcribing sociolinguistic interviews.
A guide for moving between ELAN and p2fa.
A modified version of Carnegie Mellon University Pronouncing Dictionary used for forced-alignment of recordings, including some words missing from the original release, as well as other miscellaneous words, such as Northwest placenames.
The documentation for the Montreal Forced Aligner, which is installed on the following lab computers: Astrid, Chesterton.

Filename

Author

Description

Free and open-source software for annotating audio and video recordings.

Filename

Author

Description

The guidelines for and principles behind the conversational analysis coding in the PNWE. However, it has been expanded to applicable for establishing guidelines for any coding of sociolinguistic data.
A scan of Appendix 2 from John W. Dubois' 1991 article Transcription Design Principles for Spoken Discourse Research. Provides a consistent lists of tags that can be used in annotating discourse.
A guide for using ELAN, updated to include the Speed Transcription Mode.
A guide for moving between ELAN and p2fa.

Filename

Author

Description

Metadata tags used within the University of Washington Sociolinguistics laboratory community for file storage and metadata encoding. Intended to promote improved file retrieval, sharing, repurposing, and search of sociolinguistic data.
Strips conversational transcriptions of a range of commonly-encountered annotations. It is structured for use with the PNWEI recordings that have been transcribed in Praat tiers (and thus includes both Praat header info and markup, and discourse mark-up <@>, , , and finally, timestamps), as well as with PNWEII recordings that have been transcribed in ELAN. (R Script)
Martin Horst
R script for the PNWE project. It searches an ELAN-generated .txt file for a tier called "Themes" and locates MUC-7 tags on that tier (for example, if you have a conversational recording, and have used conversational analysis to mark locations where particular topics occur). It extracts the theme label (using these to then sort the output), the associated annotations (from the transcription tier) and the timestamps associated with those annotations.