The University of Washington Vocal Joystick (VJ) project strives to produce devices that enable individuals with motor impairments to make use of their voice to fluently control objects on a computer screen (mouse movement, menus, etc.) and ultimately electro-mechanical instruments. The VJ project involves a variety of research areas, including phonetics, cognition science, human perception, signal processing, machine learning, and human-computer interaction. The current VJ system allows the user to perform simple mouse-control tasks, such as browsing the web and creating computer-drawn pictures, all with the use of just the voice.
Statistical language modeling remains a challenging task, in particular for morphologically rich languages. Recently, new approaches based on factored language models have been developed to address this problem. These models provide principled ways of including additional conditioning variables other than the preceding words, such as morphological or syntactic features. However, the number of possible choices for model parameters creates a large space of models that cannot be searched exhaustively. This project develops an entirely data-driven model selection procedure based on genetic search, which is shown to outperform both knowledge-based and random selection procedures on two different language modeling tasks (Arabic and Turkish).
Rich, constraint-based grammar representations (such as those provided by hand-built HPSG grammars) frequently have a depth of linguistic information not present in more traditional treebanks. But in order to be useful, there has to be a way to select the correct parse from among the potentially many that such grammars provide. The approach in this paper is to learn statistically which features best discriminate between correct parses and incorrect parses. Although the learning phase of this approach is computationally expensive, it has the potential to discover discriminative features not anticipated by experts, and has shown some initial promise.
Detecting discourse patterns such as dialog acts (DAs) is an important factor for processing spoken conversations and meetings. Different techniques have been used to tag dialog acts in the past such as hidden Markov models and neural networks. In this work, a full analysis of dialog act tagging using different generative and discriminative dynamic Bayesian networks (DBNs) is performed, where both conventional switching n-grams and factored language models (FLMs) are used as DBN edge implementations. Our tests on the ICSI meeting recorder dialog act (MRDA) corpus show that the factored language model implementations are better than the switching n-gram approach. Our results also show that by using virtual evidence, the label bias problem in a discriminative model can be avoided. Also, we find that on a corpus such as MRDA, using the dialog acts of previous sentences to help predict current words does not improve our discriminative model.
The lack of sentence boundaries and the presence of disfluencies pose difficulties for parsing conversational speech. This work investigates the effects of automatically detecting these phenomena on the performance of probabilistic parsers. We demonstrate that using a state-of-the-art segmenter gives more than 45% of the possible error reduction in parser performance (relative to a silence-based segmenter) and (for some parsers) the presentation of detected interruption points improves performance over using sentence boundaries alone.
Language models define vocabularies and bias decoders to more likely word sequences in modern speech recognition systems. Training an N-gram language model, the most commonly used type of model, requires large quantities of text that is matched to the target recognition task both in terms of style and topic. Lack of in-domain data is a problem in training language models for conversational speech recognition, particularly for languages other than English, because collecting ideal material is costly. Recently researchers have turned to the World Wide Web as an additional source of training data for language modeling. However, most web texts are in written form rather than conversational style. We extended the basic approach of searching the web for documents containing conversational phrases, i.e. frequent n-grams from conversational transcripts, to Mandarin. As in English, this approach has also been shown to provide text that is better matched to a conversational speaking style in Mandarin. Looking through the data that we obtained from the web, chatrooms contribute some of the more conversational data.
Updating language model training data is important in any language, but China in particular has been changing rapidly in recent years. Lexical items have been invented or borrowed from other languages into Mandarin. In addition to capturing conversational style, we are also concerned with covering recent words, which may be sparsely represented in the training data. Gathering text from more recent sources, such as from the web, can help LMs cover higher-order n-grams with these words. Because the possible topics in our test data were given, we did not only collect general conversational web texts but also topic-oriented conversational web texts. Hence, topic-oriented conversational web data collection protocol was developed. Alternatives for explicit topic modeling, including static topic models and dynamic topic models, are also investigated.
Our experiments were conducted on the SRI's 5-times real-time system, which was a early stage of the Rich Transcription Fall 2004 Evaluation system. We obtained 28% reduction in word perplexity and 7% relative reduction in character error rate with very simple mixture models. However, relatively little further improvement was also obtained from more fine grained modeling of topics.
Text-based ontology population and learning have primarily focused on the use of domain-specific corpora in order to augment general ontologies with specialized terms or generate ontologies for a given domain. The Web contains a wealth of text pages, some domain-specific, others mixing useful information from various knowledge domains. We show how KnowItAll, an autonomous information-extraction system, can help us leverage Web text data for large-scale additions to an existent lexical ontology (Wordnet). We explore the unique challenges of large-scale ontology extension from the Web and discuss the implications of Web-based ontology extension for KnowItAll's goal of "life-long" learning.
In this project we investigate a new discriminative model structure applied to the task of part-of-speech (POS) tagging. The more common upward Conditional Markov Model (CMM) is modified to include an additional "observed child" node between pairs of adjacent tags. Rather than having each tag be the parent of the next tag, each tag is now the parent of two "observed child" nodes (one to the left and one to the right). This new structure eliminates the pitfall inherent in the CMM which implies that current states (tags) are independent of future observations (words).
To train this model we introduce the notion of negative training data. For each "true" sentence from our training dataset (WSJ Penn Treebank III), we generate N "false" sentences by scrambling the words (and associated tags) in the sentence. The training then consists of a set of M "positive" sentences for which the observed-child node is equal to 1, and NM "negative" sentences for which the observed-child node is equal to 0. (During the testing phase, the observed child is always equal to 1).
We compare the performance of this model to three other models: a naive Bayes model, the CMM mentioned above, and a Hidden Markov Model (HMM). With no negative training data, the performance of the new model is comparable to the naive Bayes model, but with negative training data, it outperforms both the CMM and the HMM.
This poster presents work in progress on the use of statistical language models (LMs) to classify text based on reading level. This work is part of a larger project to develop automated tools to help teachers and students find reading-level appropriate texts matched to a particular topic for foreign and second language learners. Bilingual education instructors typically need to seek out "high interest level" texts at low reading levels, e.g. texts at a first or second grade level that support the 5th grade science curriculum. Traditional reading level measures use simple approximations of syntactic complexity and semantic complexity. Sentence length is often used to approximate syntactic complexity, and semantic complexity is indicated by syllable count or word frequency relative to a standard list. We believe that n-gram language models in combination with traditional measures can achieve better performance by more accurately capturing syntactic and semantic information.
In our current work, we use a corpus obtained from Weekly Reader, a children's newspaper with versions for 2nd, 3rd, 4th and 5th grade reading levels. Our basic classifier consists of one n-gram language model per reading level. To classify a text sample, we calculate its perplexity relative to each model and choose the model with the lowest perplexity. A "soft" feature selection method in which some words are selected as features and others are replaced with their part-of-speech (POS) tags was more effective than either word-only or POS-only n-gram models. We tested several feature selection methods: maximum mutual information, information gain, and Kullback-Leibler distance, obtaining the best results with the information gain criterion. With the n-gram scores alone, we have achieved 65% classification accuracy for this 4-class problem.
Work in progress includes the development of a secondary classifier (e.g. decision tree) which uses LM scores as described above in addition to traditional features, such as average sentence length, average syllables per word, and more complex features such as features derived from automatic parses.
It is a trend for enterprises to understand and manage the expertise of their employees. However, currently, employee profiles usually are manually edited and updated, which is a tedious task and often produces results of poor quality. The goal of this research is to develop a user-centric modeling technology which dynamically describes and updates an employee's expertise to enhance employees' collaboration and productivity in the enterprise environment. Instead of using several keywords as most traditional approaches do, we propose an evolutionary graphical model - which we called ExpertiseNet, to describe the person. The temporal evolution and relational information are explicitly described in ExpertiseNet, thus it is good for mining, retrieval, and visualization. Specially, we will show our ExpertiseNets built from a research paper corpus, which includes both the words from each paper and the citation linkages, and how the expertise mining and matching are efficiently achieved based on these ExpertiseNets.
Back to symposium main page