Lab member thesis topics
Please add a short description of your thesis (MA or PhD) here, with a date indicating
when the description was last added/modified.
Utilizing Multilingual Resources for Automatic Lexical Acquisition (MA)
Michael Wayne Goodman
I'm investigating how we can leverage the knowledge built into the lexicons of large, mature grammars to help bootstrap the lexicons of much smaller grammars. For my test, I am using the Jacy Japanese grammar as the source and the Ita Italian MMT grammar as the target. I am using the Turing Center's Transgraph
project to provide word translations, and some hand-built type mappings from one grammar to the other to figure out the types a word can have. Because of the nature of the project, many spurious items are produced, so I need to apply some filtering to the data to try and remove them. Another aspect of the project is to try and automatically learn transfer rules between the grammars involved. This becomes difficult when source words do not transfer to a single target word, when they change argument structure, etc.
Generating Referring Expressions (MA)
Margaret Ann Mitchell
I'm exploring the problem of how to refer to entities naturally. This is a sub-task within natural language generation, mapping nonlinguistic data to a linguistic output. My focus is primarily on creating distinguishing descriptions, ie, given a set of objects from which one object is selected, what noun phrase will be used to refer to it? Current approaches are based off Dale & Reiter's Incremental Algorithm
, which uses the Gricean Maxims
as a guide to naturalness, but these maxims are prescriptive, not descriptive, and fail to capture what humans actually do. I intend to do some data-mining to create a basic preference ordering for adjectives, and use this to propose a new algorithm that better captures human referring expression generation. I also want to touch on which types of referring expressions are used in which contexts, in an attempt to help natural language generators decide how to refer to entities in a stream of text. The domain will probably be limited to the Wall Street Journal.
-- Main.itallow - 02 Apr 2008
Dealing with imperfection in using statistical syntax for machine translation
Jeremy G. Kahn
(tentative title, abstract)I am exploring the various ways to use statistical syntax (e.g. the Charniak parser) for (statistical) machine translation (SMT). My research includes using syntax for word-alignment, MT evaluation, and tuning upstream systems (such as ASR). Current SMT systems do not incorporate syntax, and use "phrases' that are quite explicitly non
-syntactic, which raises challenges for the inclusion of syntax in translation modeling. I am particularly interested in:
- using dependency extraction as a measure of syntactic/semantic similarity
- how to cope with (or better, make use of) the uncertainty of a statistical parser in these contexts: how can that uncertainty be made useful?
-- Main.jgk - 02 Apr 2008
Mass Text Annotation With Mechanical Turk
I am using Amazon's Mechanical Turk service to do multi-user annotation of linguistic phenomena in Wikipedia text. I'm trying to see if I can get good inter-annotator agreement for different kinds of noun phrase annotation. The hope is that this could be an cheaper alternative way of producing annotated corpora. Along the way I am developing reusable Ruby libraries to efficiently parse web text, extract constituents matching certain criteria, and automatically generate Mechanical Turk questions.