The ATAROS (Automatic Tagging and Recognition of Stance) project aims to identify acoustic signals of stance-taking (opinions, evaluations, judgments, etc.) in order to inform the development of automatic stance recognition in natural speech. Because existing corpora generally have a low frequency of stance-taking in conversation, we are creating an audio corpus of dyads completing collaborative tasks designed to elicit a high density of stance-taking at increasing levels of involvement. Funded by NSF IIS #1351034 awarded to PIs Gina-Anne Levow, Richard Wright, Mari Ostendorf.
Coming late 2014: Access to the corpus for other researchers
Technical Report 1 (May 2014): Corpus collection and initial task validation [download manuscript]
Freeman, V., Levow, G.-A., & Wright, R. (2014). “Phonetic marking of stance in a collaborative-task spontaneous-speech corpus.” Presented at the 167th Meeting of the Acoustical Society of America (ASA), Providence, RI, May 5-9. [download handout]
Freeman, V., Chan, J., Levow, G.-A., Wright, R., Ostendorf, M., & Zayats, V. (to appear). Manipulating stance and involvement using collaborative tasks: An exploratory comparison. Proceedings of the 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Singapore, Sept. 14-18, 2014. [download manuscript]
Luan, Y., Wright, R., Ostendorf, M., & Levow, G.-A. (to appear). Relating automatic vowel space estimates to talker intelligibility. Proceedings of the 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Singapore, Sept. 14-18, 2014. [download manuscript]
Perceptual Adaptation to Distortions
This project examines several issues related to speech perception under conditions of distortion (i.e., hearing loss, hearing aid amplification, dialect, foreign accent, etc). Current work is focused on cross-dialect perception and talker familiarity, and how they interact with hearing loss of different types. This research is funded by NIDCD flowthrough funds to Pamela Souza at Northwestern University (NIH grant R01DC006014), and is a collaboration with Northwestern's Hearing Aid Lab.
Souza, P., Gehani, N., Wright, R., & McCloy, D. (submitted: Journal of the American Academy of Audiology). The advantage of knowing the talker. download manuscript
Wright, R.A. & Souza, P.E. (accepted for publication). Comparing identification of standardized and regionally-valid vowels. Journal of Speech Language and Hearing Research.
Souza, P.E., Wright, R.A., & Bor, S. (accepted for publication). Combined multichannel compression and reduced frequency selectivity for vowel identification. Journal of Speech Language and Hearing Research.
Bor, S., Souza, P.E., & Wright, R.A. (2008). Multichannel compression: Effects of reduced spectral contrast on vowel identification. Journal of Speech, Language, and Hearing Research, 51(5), 1315-1327. doi:10.1044/1092-4388(2008/07-0009)
PHOIBLE (Phonetics Information Base and Lexicon) is a knowledge base of phonological inventories, structured as a queryable and extensible mathematical graph. The knowledge base includes allophonic detail for many languages, and all phones are encoded as IPA unicode and as vectors of distinctive features, allowing "fuzzy" queries for classes of sounds instead of searching using individual glyphs. As of January 2012, there were over 1500 languages included.
Moran, S., McCloy, D.R., & Wright, R. A. (under revision: Language). Revisiting population size vs phoneme inventory size. download manuscript
Moran, S., McCloy, D.R., & Wright, R.A. (2012). Revisiting the population vs phoneme-inventory correlation. Presented at the 86th Meeting of the Linguistic Society of America, Portland, OR. download slides or extended abstract
Moran, S. Technological infrastructure for comparative linguistics and linguistic fieldwork: Some case studies. Presented at Max-Planck-Institut für evolutionäre Anthropologie, Leipzig, 27 January 2011.
The Vocal Joystick Project seeks to develop a continuous and discrete control device based on continuous and discrete dimensions of human vocalizations. There are four continuous dimensions that can be extracted from vowel-like vocalizations: vowel height, vowel backness, pitch, and intensity. In addition to these, discrete control can be achieved through brief obstruent-like sounds and through words. The extracted dimensions can be used as control parameters for a variety of devices. The most obvious application is computer control in GUI environments, but applications also include robotic arms, thermostats, and lighting levels. Read more...
Xiao Li, Jonathan Malkin, Susumu Harada, Jeff Bilmes, Richard Wright and James Landay, "An Online Adaptive Filtering Algorithm for the Vocal Joystick," Interspeech, Pittsburgh, Sep. 2006 [pdf] [bibtex] [video]
Jeff Bilmes, Jonathan Malkin, Xiao Li, Susumu Harada, Kelley Kilanski, Katrin Kirchhoff, Richard Wright, Amarnag Subramanya, James Landay, Patricia Dowden and Howard Chizeck, "The Vocal Joystick," IEEE Intl. Conf. on Audio, Speech and Signal Processing, Toulouse, France, May 2006 [pdf] [bibtex]
Jeff A. Bilmes, Xiao Li, Jonathan Malkin, Kelley Kilanski, Richard Wright, Katrin Kirchhoff, Amarnag Subramanya, Susumu Harada, James A. Landay, Patricia Dowden and Howard Chizeck, "The Vocal Joystick: A Voice-Based Human-Computer Interface for Individuals with Motor Impairments," Human Language Technology Conf. and Conf. on Empirical Methods in Natural Language Processing, Vancouver, Canada, Oct. 2005 [pdf] [bibtex]