UW/Microsoft Quarterly Symposium in Computational Linguistics

Poster presented at the 10th Quarterly Symposium, 10/20/06

Segmentation and Feature Selection for Conversational Speech

William McNeill, Jeremy G. Kahn, Dustin Hillard and Mari Ostendorf

We investigated the use of syntactic information to improve speech recognition accuracy. Previous work has shown that parser-based language models can be used to improve speech recognition and, separately, that the segmentation of speech into sentence-like units has an impact of parser performance. We bring these findings together by investigating two research questions: 1) can good segmentation improve the accuracy of a speech recognition system that uses syntactic information, and 2) which aspects of the syntactic structure represented in a given segment's parses are most useful for improving recognition accuracy?

The system resegments word lattices generated by the SRI 5xRT speech recognizer into sentence-like segments which are then converted to N-best lists. The N-best hypotheses are parsed, and syntactic information is extracted from these parses for use in a discriminative model that reranks the N-best lists. Segmentation experiments show significant potential WER gains in the comparison of the oracle and baseline segmentations, although the automatic segmentation schemes investigated do not realize these gains. Experiments with different knowledge sources demonstrate that syntactic information can improve recognizer accuracy. Lexicalized non-local syntactic features that in previous work have proven useful for identifying high-quality parses are also shown to be useful for recognition accuracy. The best combination of syntactic features achieved a 6% relative reduction in WER on the oracle segmentation and a 3% relative reduction in WER on automatically detected segmentations.

Back to symposium main page