Fluent conversation involves complex, multi-modal coordination among participants, although most conversants accomplish it with little effort. In cross-cultural settings, this coordination can prove more difficult to achieve. In this talk, I focus on automatic prediction of verbal feedback, one component of this process, across three language/cultural groups: American English, Mexican Spanish, and Iraqi Arabic. I identify key challenges due to language-specific differences, inter-speaker variation, and the relative sparseness and optionality of verbal feedback. Our approach addresses these challenges through a machine learning regime and exploits prosodic features, including pitch, intensity, and duration, to dramatically improve prediction of verbal feedback. Feature analysis identifies both similarities and contrasts across languages.
Back to symposium main page