Fluent conversation involves complex, multi-modal coordination among participants, although most conversants accomplish it with little effort. In cross-cultural settings, this coordination can prove more difficult to achieve. In this talk, I focus on automatic prediction of verbal feedback, one component of this process, across three language/cultural groups: American English, Mexican Spanish, and Iraqi Arabic. I identify key challenges due to language-specific differences, inter-speaker variation, and the relative sparseness and optionality of verbal feedback. Our approach addresses these challenges through a machine learning regime and exploits prosodic features, including pitch, intensity, and duration, to dramatically improve prediction of verbal feedback. Feature analysis identifies both similarities and contrasts across languages.