Difference: NlpCorpora (1 vs. 15)

Revision 152011-08-23 - brodbd

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Usage Guidelines

Line: 9 to 9
 
  1. Compling laboratory members are granted access to corpora solely for coursework and research projects in the context of their affiliation with the UW.
  2. Corpora may not be copied from the servers, nor used in commercial applications, unless permitted by the corpus license agreement.
  3. Many of the corpora have additional licensing conditions (see the CompLing Database.) Before you access any particular corpus, you are responsible for reading and understanding the license. For LDC corpora, you should also read the general membership agreement.
Changed:
<
<
  1. For some of the corpora, we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. This is indicated in the "Restriction" column in the database. To access these corpora, you'll need to contact the lab director to obtain read permissions on the relevant directories.
>
>
  1. For some of the corpora, we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. This is indicated in the "Restriction" column in the database. To access these corpora, you'll need to click the "Request Access" button and agree to the license agreement.
 
  1. Whenever you use a corpus for course work or for a paper, you should cite the corpus among your references. The proper citation information should be found in the license or README file of the corpus.
  2. Failure to follow these policies could result in loss of access to the corpora, or to the lab/servers in general.

Revision 142011-04-21 - brodbd

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Usage Guidelines

Line: 15 to 15
 

Available corpora

Changed:
<
<
For a list of currently available corpora, along with their licensing and access information, see the CompLing Database.
>
>
For a list of currently available corpora, along with their licensing and access information, see the CompLing Database.
  (If your browser prompts you with a certificate warning, you need to install the UW root certificate.)
Line: 31 to 31
 Lab members who would like access to a corpus listed as "Available" in the database should send an email to linghelp@u with a request for it to be installed.

Lab members who would like access to a corpus not listed in the database should send an email to Emily (ebender at u) with the request.

Deleted:
<
<
-- brodbd - 29 Dec 2008
 \ No newline at end of file

Revision 132011-04-05 - brodbd

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Usage Guidelines

Line: 8 to 8
 
  1. Compling laboratory members are granted access to corpora solely for coursework and research projects in the context of their affiliation with the UW.
  2. Corpora may not be copied from the servers, nor used in commercial applications, unless permitted by the corpus license agreement.
Changed:
<
<
  1. Many of the corpora have additional licensing conditions (see the CompLing Database.) Before you access any particular corpus, you are responsible for reading and understanding the license. For LDC corpora, you should also read the general membership agreement.
>
>
  1. Many of the corpora have additional licensing conditions (see the CompLing Database.) Before you access any particular corpus, you are responsible for reading and understanding the license. For LDC corpora, you should also read the general membership agreement.
 
  1. For some of the corpora, we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. This is indicated in the "Restriction" column in the database. To access these corpora, you'll need to contact the lab director to obtain read permissions on the relevant directories.
  2. Whenever you use a corpus for course work or for a paper, you should cite the corpus among your references. The proper citation information should be found in the license or README file of the corpus.
  3. Failure to follow these policies could result in loss of access to the corpora, or to the lab/servers in general.

Revision 122008-12-29 - brodbd

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Usage Guidelines

Line: 6 to 6
  In order to ensure compliance with the licenses for the various corpora we have installed, we have instituted the following policies.
Changed:
<
<
  1. Compling laboratory members are granted access to corpora solely for coursework and research projects in the context of their affiliation with the UW.
  2. Corpora may not be copied from the servers, nor used in commercial applications, unless permitted by the corpus license agreement.
  3. Many of the corpora have additional licensing conditions (see the CompLing Database.) Before you access any particular corpus, you are responsible for reading and understanding the license. For LDC corpora, you should also read the general membership agreement.
  4. For some of the corpora, we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. This is indicated in the "Restriction" column in the database. To access these corpora, you'll need to contact the lab director to obtain read permissions on the relevant directories.
  5. Whenever you use a corpus for course work or for a paper, you should cite the corpus among your references. The proper citation information should be found in the license or README file of the corpus.
  6. Failure to follow these policies could result in loss of access to the corpora, or to the lab/servers in general.
>
>
  1. Compling laboratory members are granted access to corpora solely for coursework and research projects in the context of their affiliation with the UW.
  2. Corpora may not be copied from the servers, nor used in commercial applications, unless permitted by the corpus license agreement.
  3. Many of the corpora have additional licensing conditions (see the CompLing Database.) Before you access any particular corpus, you are responsible for reading and understanding the license. For LDC corpora, you should also read the general membership agreement.
  4. For some of the corpora, we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. This is indicated in the "Restriction" column in the database. To access these corpora, you'll need to contact the lab director to obtain read permissions on the relevant directories.
  5. Whenever you use a corpus for course work or for a paper, you should cite the corpus among your references. The proper citation information should be found in the license or README file of the corpus.
  6. Failure to follow these policies could result in loss of access to the corpora, or to the lab/servers in general.
 

Available corpora

For a list of currently available corpora, along with their licensing and access information, see the CompLing Database.

Changed:
<
<
(If your browser prompts you with a certificate warning, you need to install the UW root certificate.)
>
>
(If your browser prompts you with a certificate warning, you need to install the UW root certificate.)

Terminology:

  • Installed means the corpus is currently installed on the server and ready to use.
  • Available means the corpus is immediately available, but not currently installed on the server.
  • Requested means that a request has been put in to LDC for the corpus, but it's not immediately available.

We can obtain any LDC corpus, but there may be a lead time of several weeks for corpora that are not listed in the database.

 

Requesting additional corpora

Lab members who would like access to a corpus listed as "Available" in the database should send an email to linghelp@u with a request for it to be installed.

Changed:
<
<
Lab members who would like access to a corpus not listed in the database should list it on CorpusWishList, and send an email to Emily (ebender at u) with the request.

-- brodbd - 07 July 2008

>
>
Lab members who would like access to a corpus not listed in the database should send an email to Emily (ebender at u) with the request.
 
Added:
>
>
-- brodbd - 29 Dec 2008
 \ No newline at end of file

Revision 112008-07-07 - brodbd

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Usage Guidelines

Line: 7 to 7
 In order to ensure compliance with the licenses for the various corpora we have installed, we have instituted the following policies.

  1. Compling laboratory members are granted access to corpora solely for coursework and research projects in the context of their affiliation with the UW.
Changed:
<
<
  1. Corpora may not be copied from the servers, nor used in commercial applications.
  2. Many of the corpora have additional licensing conditions (see the CompLing Database.) Before you access any particular corpus, you are responsible for reading and understanding the license, in addition to the general membership agreement.
>
>
  1. Corpora may not be copied from the servers, nor used in commercial applications, unless permitted by the corpus license agreement.
  2. Many of the corpora have additional licensing conditions (see the CompLing Database.) Before you access any particular corpus, you are responsible for reading and understanding the license. For LDC corpora, you should also read the general membership agreement.
 
  1. For some of the corpora, we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. This is indicated in the "Restriction" column in the database. To access these corpora, you'll need to contact the lab director to obtain read permissions on the relevant directories.
  2. Whenever you use a corpus for course work or for a paper, you should cite the corpus among your references. The proper citation information should be found in the license or README file of the corpus.
  3. Failure to follow these policies could result in loss of access to the corpora, or to the lab/servers in general.
Line: 25 to 25
  Lab members who would like access to a corpus not listed in the database should list it on CorpusWishList, and send an email to Emily (ebender at u) with the request.
Changed:
<
<
-- DavidBrodbeck - 07 Jun 2007
>
>
-- brodbd - 07 July 2008
 

Revision 102007-06-07 - DavidBrodbeck

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

Corpus Availability and Use

>
>

Corpus Usage Guidelines

 

Access Policies

Line: 8 to 8
 
  1. Compling laboratory members are granted access to corpora solely for coursework and research projects in the context of their affiliation with the UW.
  2. Corpora may not be copied from the servers, nor used in commercial applications.
Changed:
<
<
  1. Many of the corpora have additional licensing conditions (see links in the list below). Before you access any particular corpus, you are responsible for reading and understanding the license, in addition to the general membership agreement.
  2. For some of the corpora (marked "Restricted Access"), we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. To access these corpora, you'll need to contact the lab director to obtain read permissions on the relevant directories.
>
>
  1. Many of the corpora have additional licensing conditions (see the CompLing Database.) Before you access any particular corpus, you are responsible for reading and understanding the license, in addition to the general membership agreement.
  2. For some of the corpora, we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. This is indicated in the "Restriction" column in the database. To access these corpora, you'll need to contact the lab director to obtain read permissions on the relevant directories.
 
  1. Whenever you use a corpus for course work or for a paper, you should cite the corpus among your references. The proper citation information should be found in the license or README file of the corpus.
  2. Failure to follow these policies could result in loss of access to the corpora, or to the lab/servers in general.
Changed:
<
<

Requesting additional corpora

Lab members who would like access to a corpus not listed below should list is on CorpusWishList, and send an email to Emily (ebender at u) with the request.

>
>

Available corpora

 
Changed:
<
<

Instructions for Instructors

>
>
For a list of currently available corpora, along with their licensing and access information, see the CompLing Database.
 
Changed:
<
<
When you request that a new corpus be installed, please add its information to the table below. To determine whether the corpus has a specific license (beyond the LDC general license), look at its catalogue entry. Corpora with specific licenses have a line that say "Member license: yes", and links to the license. Read the license carefully to determine whether we need to maintain a list of users who have access (the "Restricted (l)" category) and/or need to have users sign an individual license in order ot acess the corpus (the "Restricted (s)" category).
>
>
(If your browser prompts you with a certificate warning, you need to install the UW root certificate.)
 
Changed:
<
<

Corpora installed on Pongo

>
>

Requesting additional corpora

 
Changed:
<
<
Title/Catalogue link LDC Catalogue Number Directory Name Language(s) Restricted Access License
UN Parallel Text (Complete) LDC94T4A LDC94T4A French, Spanish, English   Specific
ECI Multilingual Text LDC94T5 LDC94T05 Multi   Specific
English Treebank 2 LDC95T7 LDC95T07 English   General
Japanese Business News Text LDC95T8 LDC95T08 Japanese Restricted (s) Specific
European Language Newspaper Text LDC95T11 LDC95T11 English Restricted (l) Specific
Mandarin Chinese News Text LDC95T13 LDC95T13 Mandarin   Specific
Hansard French/English LDC95T20 LDC95T20 French, English   General
CELEX2 LDC1996L14 LDC96L14 Dutch, German, English   Specific
CALLFRIEND American English-Non-Southern Dialect LDC96S46 LDC_CALLFRIEND_DISC_{1,2,3} English   General
DSO Corpus of Sense-Tagged English LDC97T12 LDC97T12 English   General
CALLHOME Egyptian Arabic Transcripts LDC97T19 LDC97T19 Arabic   General
Japanese Business News Text Supplement LDC99T34 LDC99T34 Japanese Restricted (s) Specific
Portuguese Newswire Text LDC99T40 LDC99T40 Portuguese   General
Spanish Newswire Text, Volume 2 LDC99T41 LDC99T41 Spanish   General
Treebank-3 LDC99T42 LDC99T42 English   General
Korean Newswire LDC2000T45 LDC00T45 Korean   General
MUC 7 LDC2001T02 LDC01T02 English   General
Chinese-English Translation Lexicon (v3.0) LDC2002L27 LDC02L27 Chinese, English   General
Multiple-Translation Chinese Corpus LDC2002T01 LDC02T01 Chinese, English   General
Korean English Treebank Annotation LDC2002T26 LDC02T26 Korean, English   General
The AQUAINT Corpus of English News Text LDC2002T31 LDC02T31 English   General
Chinese <-> English Name Entity Lists LDC2003E01 LDC03E01 Chinese, English   General
Arabic Treebank: Part 1 v 2.0 LDC2003T06 LDC03T06 Arabic, English   General
Arabic Gigaword LDC2003T12 LDC03T12 Arabic   General
MUC 6 LDC2003T13 LDC03T13 English   General
SLX Corpus of Classic Sociolinguistic Interviews LDC2003T15 LDC03T15 English   General
Multiple-Translation Chinese (MTC) Part 2 LDC2003T17 LDC03T17 Chinese, English   General
ANC First Release LDC2003T20 LDC03T20 English Restricted (l) Specific
Klex: Finite-State Lexical Transducer for Korean LDC2004L01 LDC04L01 Korean   General
Buckwalter Arabic Morphological Analyzer Version 2.0 LDC2004L02 LDC04L02 Arabic   Specific
Arabic Treebank: Part 2 v 2.0 LDC2004T02 LDC04T02 Arabic   General
Santa Barbara Corpus of Spoken American English III LDC2004S10 LDC04S10 English   General
Morphologically Annotated Korean Text LDC2004T03 LDC04T03 Korean   General
Multiple-Translation Chinese (MTC) Part 3 LDC2004T07 LDC04T07 Chinese, English   General
Hong Kong Parallel Text LDC2004T08 LDC04T08 Chinese, English   Specific
TIDES Extraction (ACE) 2003 Multilingual Training Data LDC2004T09 LDC04T09 Arabic, Chinese, English   General
Proposition Bank I LDC2004T14 LDC04T14 English   General
Chinese Treebank 5.0 LDC2005T01 LDC05T01 Chinese   General
Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis) LDC2005T02 LDC05T02 Arabic   General
Multiple-Translation Arabic (MTA) Part 2 LDC2005T05 LDC05T05 Arabic, English   General
Chinese News Translation Text Part 1 LDC2005T06 LDC05T06 Chinese, English   General
ACE 2004 Multilingual Training Corpus LDC2005T09 LDC05T09 Arabic, Chinese, English   General
Chinese English News Magazine Parallel Text LDC2005T10 LDC05T10 Chinese, English   General
CCGbank LDC2005T13 LDC05T13 English   General
Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis) LDC2005T20 LDC05T20 Arabic   General
2006 CoNLL Shared Task Training Data LDC2006E01 LDC06E01 Arabic, Czech   Specific: Arabic, Czech
2006 CoNLL Shared Task Test Data LDC2006E02 LDC06E02 Arabic, Czech   Specific: Arabic, Czech
Prague Dependency Treebank 2.0 LDC2006T01 LDC06T01 Czech   Specific
Korean Treebank Annotation v 2.0 LDC2006T09 LDC06T09 Korean   General
English-Arabic Treebank v 1.0 LDC2006T10 LDC06T10 Arabic, English   General
Spanish Gigaword 1st Edition LDC2006T12 LDC06T12 Spanish   General
Middle East Technical University Turkish Microphone Speech v 1.0 LDC2006S33 LDC06S33 Turkish   General
>
>
Lab members who would like access to a corpus listed as "Available" in the database should send an email to linghelp@u with a request for it to be installed.
 
Added:
>
>
Lab members who would like access to a corpus not listed in the database should list it on CorpusWishList, and send an email to Emily (ebender at u) with the request.
 
Changed:
<
<
-- FeiXia - 18 May 2007
>
>
-- DavidBrodbeck - 07 Jun 2007
 

Revision 72007-05-17 - FeiXia

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Availability and Use

Line: 72 to 72
 
2006 CoNLL Shared Task Test Data LDC2006E02 LDC06E02 Arabic, Czech   Specific: Arabic, Czech
Prague Dependency Treebank 2.0 LDC2006T01 LDC06T01 Czech   Specific
English-Arabic Treebank v 1.0 LDC2006T10 LDC06T10 Arabic, English   General
Changed:
<
<
Middle East Technical University Turkish Microphone Speech v 1.0 LDC2006S33 LDC06S33 Turkish   General
>
>
| Spanish Gigaword 1st Edition | LDC2006T12 | LDC06T12 | Spanish |   | General | | Middle East Technical University Turkish Microphone Speech v 1.0 | LDC2006S33 | LDC06S33 | Turkish |   | General | |
 

-- EmilyBender - 04 May 2007

Revision 62007-05-08 - EmilyBender

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Availability and Use

Line: 31 to 31
 
European Language Newspaper Text LDC95T11 LDC95T11 English Restricted (l) Specific
Mandarin Chinese News Text LDC95T13 LDC95T13 Mandarin   Specific
Hansard French/English LDC95T20 LDC95T20 French, English   General
Added:
>
>
CELEX2 LDC1996L14 LDC96L14 Dutch, German, English   Specific
 
CALLFRIEND American English-Non-Southern Dialect LDC96S46 LDC_CALLFRIEND_DISC_{1,2,3} English   General
DSO Corpus of Sense-Tagged English LDC97T12 LDC97T12 English   General
CALLHOME Egyptian Arabic Transcripts LDC97T19 LDC97T19 Arabic   General

Revision 52007-05-04 - EmilyBender

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Availability and Use

Line: 43 to 43
 
Chinese-English Translation Lexicon (v3.0) LDC2002L27 LDC02L27 Chinese, English   General
Multiple-Translation Chinese Corpus LDC2002T01 LDC02T01 Chinese, English   General
The AQUAINT Corpus of English News Text LDC2002T31 LDC02T31 English   General
Changed:
<
<
Chinese <-> English Name Entity Lists LDC2003E01 LDC03E01 Chinese, English ?? ??
>
>
Chinese <-> English Name Entity Lists LDC2003E01 LDC03E01 Chinese, English   General
 
Arabic Treebank: Part 1 v 2.0 LDC2003T06 LDC03T06 Arabic, English   General
Arabic Gigaword LDC2003T12 LDC03T12 Arabic   General
MUC 6 LDC2003T13 LDC03T13 English   General
Line: 51 to 51
 
Multiple-Translation Chinese (MTC) Part 2 LDC2003T17 LDC03T17 Chinese, English   General
ANC First Release LDC2003T20 LDC03T20 English Restricted (l) Specific
Klex: Finite-State Lexical Transducer for Korean LDC2004L01 LDC04L01 Korean   General
Added:
>
>
Buckwalter Arabic Morphological Analyzer Version 2.0 LDC2004L02 LDC04L02 Arabic   Specific
Arabic Treebank: Part 2 v 2.0 LDC2004T02 LDC04T02 Arabic   General
 
Santa Barbara Corpus of Spoken American English III LDC2004S10 LDC04S10 English   General
Morphologically Annotated Korean Text LDC2004T03 LDC04T03 Korean   General
Multiple-Translation Chinese (MTC) Part 3 LDC2004T07 LDC04T07 Chinese, English   General
Line: 62 to 64
 
Multiple-Translation Arabic (MTA) Part 2 LDC2005T05 LDC05T05 Arabic, English   General
Chinese News Translation Text Part 1 LDC2005T06 LDC05T06 Chinese, English   General
ACE 2004 Multilingual Training Corpus LDC2005T09 LDC05T09 Arabic, Chinese, English   General
Added:
>
>
Chinese English News Magazine Parallel Text LDC2005T10 LDC05T10 Chinese, English   General
CCGbank LDC2005T13 LDC05T13 English   General
Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis) LDC2005T20 LDC05T20 Arabic   General
2006 CoNLL Shared Task Training Data LDC2006E01 LDC06E01 Arabic, Czech   Specific: Arabic, Czech
2006 CoNLL Shared Task Test Data LDC2006E02 LDC06E02 Arabic, Czech   Specific: Arabic, Czech
 
Prague Dependency Treebank 2.0 LDC2006T01 LDC06T01 Czech   Specific
Added:
>
>
English-Arabic Treebank v 1.0 LDC2006T10 LDC06T10 Arabic, English   General
Middle East Technical University Turkish Microphone Speech v 1.0 LDC2006S33 LDC06S33 Turkish   General
 
Changed:
<
<
-- EmilyBender - 07 Sep 2006
>
>

-- EmilyBender - 04 May 2007

 

Revision 42006-11-10 - WilliamLewis

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Availability and Use

Line: 43 to 43
 
Chinese-English Translation Lexicon (v3.0) LDC2002L27 LDC02L27 Chinese, English   General
Multiple-Translation Chinese Corpus LDC2002T01 LDC02T01 Chinese, English   General
The AQUAINT Corpus of English News Text LDC2002T31 LDC02T31 English   General
Changed:
<
<
?? Chinese <-> English Name Entity Lists LDC03E01 ?? ?? ??
>
>
Chinese <-> English Name Entity Lists LDC2003E01 LDC03E01 Chinese, English ?? ??
 
Arabic Treebank: Part 1 v 2.0 LDC2003T06 LDC03T06 Arabic, English   General
Arabic Gigaword LDC2003T12 LDC03T12 Arabic   General
MUC 6 LDC2003T13 LDC03T13 English   General

Revision 32006-11-09 - WilliamLewis

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Corpus Availability and Use

Line: 43 to 43
 
Chinese-English Translation Lexicon (v3.0) LDC2002L27 LDC02L27 Chinese, English   General
Multiple-Translation Chinese Corpus LDC2002T01 LDC02T01 Chinese, English   General
The AQUAINT Corpus of English News Text LDC2002T31 LDC02T31 English   General
Changed:
<
<
?? ?? LDC03E01 ?? ?? ??
>
>
?? Chinese <-> English Name Entity Lists LDC03E01 ?? ?? ??
Arabic Treebank: Part 1 v 2.0 LDC2003T06 LDC03T06 Arabic, English   General
 
Arabic Gigaword LDC2003T12 LDC03T12 Arabic   General
MUC 6 LDC2003T13 LDC03T13 English   General
SLX Corpus of Classic Sociolinguistic Interviews LDC2003T15 LDC03T15 English   General
Line: 57 to 58
 
TIDES Extraction (ACE) 2003 Multilingual Training Data LDC2004T09 LDC04T09 Arabic, Chinese, English   General
Proposition Bank I LDC2004T14 LDC04T14 English   General
Chinese Treebank 5.0 LDC2005T01 LDC05T01 Chinese   General
Added:
>
>
Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis) LDC2005T02 LDC05T02 Arabic   General
 
Multiple-Translation Arabic (MTA) Part 2 LDC2005T05 LDC05T05 Arabic, English   General
Chinese News Translation Text Part 1 LDC2005T06 LDC05T06 Chinese, English   General
ACE 2004 Multilingual Training Corpus LDC2005T09 LDC05T09 Arabic, Chinese, English   General
Added:
>
>
Prague Dependency Treebank 2.0 LDC2006T01 LDC06T01 Czech   Specific
  -- EmilyBender - 07 Sep 2006

Revision 22006-09-11 - EmilyBender

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<
+++UW Computational Linguistics Laboratory ++Corpora List and Access Policies
>
>

Corpus Availability and Use

Access Policies

  In order to ensure compliance with the licenses for the various corpora we have installed, we have instituted the following policies.
Line: 12 to 13
 
  1. Whenever you use a corpus for course work or for a paper, you should cite the corpus among your references. The proper citation information should be found in the license or README file of the corpus.
  2. Failure to follow these policies could result in loss of access to the corpora, or to the lab/servers in general.
Added:
>
>

Requesting additional corpora

Lab members who would like access to a corpus not listed below should list is on CorpusWishList, and send an email to Emily (ebender at u) with the request.

 
Changed:
<
<
++Corpora installed on Pongo
>
>

Instructions for Instructors

 
Changed:
<
<
Title LDC Catalogue Number Directory Name Restricted Access Language(s) License
  LDC94T05 LDC94T05
  LDC94T4A LDC94T4A
  LDC95T07 LDC95T07
  LDC95T08 LDC95T08
  LDC95T11 LDC95T11
  LDC95T13 LDC95T13
  LDC95T20 LDC95T20
  LDC97T12 LDC97T12
  LDC97T19 LDC97T19
  LDC99T34 LDC99T34
  LDC99T40 LDC99T40
  LDC99T41 LDC99T41
  LDC99T42 LDC99T42
    LDC_CALLFRIEND_DISC_{1,2,3}
  LDC00T45 LDC00T45
  LDC01T02 LDC01T02
  LDC02L27 LDC02L27
  LDC02T01 LDC02T01
  LDC02T31 LDC02T31
  LDC03E01 LDC03E01
  LDC03T12 LDC03T12
  LDC03T13 LDC03T13
  LDC03T15 LDC03T15
  LDC03T17 LDC03T17
  LDC03T20 LDC03T20
  LDC04L01 LDC04L01
LDC04S10 LDC04T03 LDC04T07 LDC04T08 LDC04T09 LDC04T14 LDC05T01 LDC05T05 LDC05T06 LDC05T09
>
>
When you request that a new corpus be installed, please add its information to the table below. To determine whether the corpus has a specific license (beyond the LDC general license), look at its catalogue entry. Corpora with specific licenses have a line that say "Member license: yes", and links to the license. Read the license carefully to determine whether we need to maintain a list of users who have access (the "Restricted (l)" category) and/or need to have users sign an individual license in order ot acess the corpus (the "Restricted (s)" category).
 
Added:
>
>

Corpora installed on Pongo

Title/Catalogue link LDC Catalogue Number Directory Name Language(s) Restricted Access License
UN Parallel Text (Complete) LDC94T4A LDC94T4A French, Spanish, English   Specific
ECI Multilingual Text LDC94T5 LDC94T05 Multi   Specific
English Treebank 2 LDC95T7 LDC95T07 English   General
Japanese Business News Text LDC95T8 LDC95T08 Japanese Restricted (s) Specific
European Language Newspaper Text LDC95T11 LDC95T11 English Restricted (l) Specific
Mandarin Chinese News Text LDC95T13 LDC95T13 Mandarin   Specific
Hansard French/English LDC95T20 LDC95T20 French, English   General
CALLFRIEND American English-Non-Southern Dialect LDC96S46 LDC_CALLFRIEND_DISC_{1,2,3} English   General
DSO Corpus of Sense-Tagged English LDC97T12 LDC97T12 English   General
CALLHOME Egyptian Arabic Transcripts LDC97T19 LDC97T19 Arabic   General
Japanese Business News Text Supplement LDC99T34 LDC99T34 Japanese Restricted (s) Specific
Portuguese Newswire Text LDC99T40 LDC99T40 Portuguese   General
Spanish Newswire Text, Volume 2 LDC99T41 LDC99T41 Spanish   General
Treebank-3 LDC99T42 LDC99T42 English   General
Korean Newswire LDC2000T45 LDC00T45 Korean   General
MUC 7 LDC2001T02 LDC01T02 English   General
Chinese-English Translation Lexicon (v3.0) LDC2002L27 LDC02L27 Chinese, English   General
Multiple-Translation Chinese Corpus LDC2002T01 LDC02T01 Chinese, English   General
The AQUAINT Corpus of English News Text LDC2002T31 LDC02T31 English   General
?? ?? LDC03E01 ?? ?? ??
Arabic Gigaword LDC2003T12 LDC03T12 Arabic   General
MUC 6 LDC2003T13 LDC03T13 English   General
SLX Corpus of Classic Sociolinguistic Interviews LDC2003T15 LDC03T15 English   General
Multiple-Translation Chinese (MTC) Part 2 LDC2003T17 LDC03T17 Chinese, English   General
ANC First Release LDC2003T20 LDC03T20 English Restricted (l) Specific
Klex: Finite-State Lexical Transducer for Korean LDC2004L01 LDC04L01 Korean   General
Santa Barbara Corpus of Spoken American English III LDC2004S10 LDC04S10 English   General
Morphologically Annotated Korean Text LDC2004T03 LDC04T03 Korean   General
Multiple-Translation Chinese (MTC) Part 3 LDC2004T07 LDC04T07 Chinese, English   General
Hong Kong Parallel Text LDC2004T08 LDC04T08 Chinese, English   Specific
TIDES Extraction (ACE) 2003 Multilingual Training Data LDC2004T09 LDC04T09 Arabic, Chinese, English   General
Proposition Bank I LDC2004T14 LDC04T14 English   General
Chinese Treebank 5.0 LDC2005T01 LDC05T01 Chinese   General
Multiple-Translation Arabic (MTA) Part 2 LDC2005T05 LDC05T05 Arabic, English   General
Chinese News Translation Text Part 1 LDC2005T06 LDC05T06 Chinese, English   General
ACE 2004 Multilingual Training Corpus LDC2005T09 LDC05T09 Arabic, Chinese, English   General
  -- EmilyBender - 07 Sep 2006

Revision 12006-09-07 - EmilyBender

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"
+++UW Computational Linguistics Laboratory ++Corpora List and Access Policies

In order to ensure compliance with the licenses for the various corpora we have installed, we have instituted the following policies.

  1. Compling laboratory members are granted access to corpora solely for coursework and research projects in the context of their affiliation with the UW.
  2. Corpora may not be copied from the servers, nor used in commercial applications.
  3. Many of the corpora have additional licensing conditions (see links in the list below). Before you access any particular corpus, you are responsible for reading and understanding the license, in addition to the general membership agreement.
  4. For some of the corpora (marked "Restricted Access"), we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. To access these corpora, you'll need to contact the lab director to obtain read permissions on the relevant directories.
  5. Whenever you use a corpus for course work or for a paper, you should cite the corpus among your references. The proper citation information should be found in the license or README file of the corpus.
  6. Failure to follow these policies could result in loss of access to the corpora, or to the lab/servers in general.

++Corpora installed on Pongo

Title LDC Catalogue Number Directory Name Restricted Access Language(s) License
  LDC94T05 LDC94T05
  LDC94T4A LDC94T4A
  LDC95T07 LDC95T07
  LDC95T08 LDC95T08
  LDC95T11 LDC95T11
  LDC95T13 LDC95T13
  LDC95T20 LDC95T20
  LDC97T12 LDC97T12
  LDC97T19 LDC97T19
  LDC99T34 LDC99T34
  LDC99T40 LDC99T40
  LDC99T41 LDC99T41
  LDC99T42 LDC99T42
    LDC_CALLFRIEND_DISC_{1,2,3}
  LDC00T45 LDC00T45
  LDC01T02 LDC01T02
  LDC02L27 LDC02L27
  LDC02T01 LDC02T01
  LDC02T31 LDC02T31
  LDC03E01 LDC03E01
  LDC03T12 LDC03T12
  LDC03T13 LDC03T13
  LDC03T15 LDC03T15
  LDC03T17 LDC03T17
  LDC03T20 LDC03T20
  LDC04L01 LDC04L01
LDC04S10 LDC04T03 LDC04T07 LDC04T08 LDC04T09 LDC04T14 LDC05T01 LDC05T05 LDC05T06 LDC05T09

-- EmilyBender - 07 Sep 2006

 
This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
Privacy Statement Terms & Conditions