UW BHI BIONLP Change of State Event Corpus 
Release 1.0.0
Date: December 31, 2014
-----------------------

Directory contents:
-------------------

Annotation_guidelines.pdf         a guide for annotating reports with the COS annotation schema

/data                             a directory containing the corpus files

/data/annotation.conf             BRAT schema configuration file
/data/visual.conf                 BRAT color and layout rules

/data/snippet1_100.txt            text file contains line-delimited snippets numbering 1_100
/data/snippet1_100.ann            BRAT annotation file containing annotations for snippets
                                  numbering 1_100

/publications                     a directory containing related publications
                                  
BRAT Annotation Tool
--------------------

The corpus files were annotated using the BRAT annotation tool:

http://brat.nlplab.org/index.html

Instructions for downloading and installing the tool can be found at:

http://brat.nlplab.org/installation.html

Instructions and basic tutorial for using BRAT:

http://brat.nlplab.org/manual.html

Description of the BRAT annotation file format (.ann):

http://brat.nlplab.org/standoff.html

COS Annotation in BRAT Format
-----------------------------

The COS annotations in our annotation files are of three basic types:

1) Entities -- which are text-bound annotations in BRAT and whose identity includes an 
initial 'T'. For example:

T1	Cos 0 15	Slight increase

Where 'T1' is the identity of the text-bound annotation, 'Cos' is the label of the 
text-bound annotation, '0' and '15' are the offsets of the text which is bound by the 
annotation in the associated snippet file, and 'Slight increase' is a copy of that bound
text.

2) Relations -- which are relations in BRAT and whose identity includes an initial 'R'. 
For example:

R5	Location Arg1:T8 Arg2:T7

Where 'R5' is the identity of the relation, 'Location' is the label of the relation, 
'Arg1:T8' is the identity of the text-bound annotation that is in the first argument 
position of the relation, and 'Arg2:t7' is the identity of the text-bound annotation that
is the second argument of the relation.

3) Notes -- which are notes in BRAT and whose identity includes an initial '#'. For 
example:

#1	AnnotatorNotes T588	 ID=1 REPORT:Report_681.txt TYPE:CPIS START:261 END:355

Where '#1' is the identity of the note, 'AnnotatorNotes' is the label of the note, 'T588' 
is the text-bound annotation that the note is associated with, 'ID=1' is the identity of 
the note using an alternate ID to BRAT's note id naming convention, 
'REPORT:Report_681.txt' is the original report in which the snippet is found, 'TYPE:CPIS' 
is the rationale annotation type (either CPIS or PNA -- see Tepper.et.al.2013.pdf in our 
publications folder for more info on these types), and 'START:261' and 'END:355' are the 
character offsets of the snippet where it exists in the original file.

Publications
------------

The three related publications for our corpus are found in the /publications directory

P. Klassen, F. Xia, L. Vanderwende, M. Yetisgen. Annotating Clinical Events in Text 
Snippets for Phenotype Detection. To Appear in Proceedings of International Conference 
on Language Resources and Evaluation (LREC), Reykjavik, Iceland. May, 2014.

L. Vanderwende, F. Xia, M. Yetisgen-Yildiz. Annotating Change of State for Clinical 
Events. Proceedings of The 1st Workshop on EVENTS: Definition, Detection, Coreference, 
and Representation Workshop of NAACL'2013, 2013.    

M. Tepper, H.L. Evans, F. Xia, M. Yetisgen-Yildiz. Modeling Annotator Rationales with 
Application to Pneumonia Classification. Proceedings of Expanding the Boundaries of 
Health Informatics Using AI Workshop of AAAI'2013, 2013.