Corpus Wish List
Over the summer, once we have our server environment set up,
we'll be installing corpora/other resources like treebanks and
wordnets. (This page will use the cover term "corpora" for all
of those.) Please use this page to put
in requests for corpora you'd like to see. Ideally, a request
should contain the actual corpus title as well as a URL where
information about it can be found. All LDC corpora are fair
game. Free non-LDC corpora should be no problem. Other non-free
corpora will also be considered. If you don't know of a particular
corpurs, but have a request for a kind of resource (e.g., a
dependency bank for language X), go ahead and put that on
as well. If you see a non-specific request like this and know
of an appropriate resource, please fill in a pointer.
Any further information (such as what you would like to use the
corpus for) is also welcome.
- Prague Czech-English Dependency Treebank (LDC2004T25 ) URL
- Czech National Corpus URL
-
ECI Multilingual Text URL
- Buckeye corpus URL
- British National Corpus URL
- CHAINS (Characterizing INdividual Speakers) URL
--
EmilyBender - 12 Apr 2005,
DavidBrodbeck - 07 Jun 2007