The ODIN Data

#Overview

ODIN stands for the Online Database of Interlinear Text. It is a collection of interlinear glossed text (IGT) instances extracted from linguistic documents on the Web.

As of version 2.0, ODIN is distributed in the Xigt format (as well as text) and is licensed under the Creative Commons CC-BY 4.0 license. Version 2.1 includes enriched data from the INTENT project, as well as numerous improvements to the cleaning and normalization of the original data. A flowchart describing how INTENT enriches IGTs is available here.

#Download ODIN

VersionDateDescription
v2.1 2016-03-14 IGT instances in the plain text format and in the Xigt format, as well as Xigt data enriched by INTENT. Contains 158,007 IGT instances from 2,027 documents covering 1,496 languages. Download Changelog
Readme
v2.0 2014-07-05 IGT instances in the plain text format and the Xigt format. Contains 158,007 IGT instances from 2,027 documents covering 1,496 languages. Download Changelog
v1.0 First release. A GUI search interface is hosted by The LINGUIST List website View
Creative Commons License
ODIN by the RiPLes Project is licensed under a Creative Commons Attribution 4.0 International License.

#Citation

If you make use of ODIN in your research, please cite the following papers:

#Related Publications

#Example

The following Icelandic [isl] example is from:

Sigurðsson, Halldór Ármann. "The Icelandic Noun Phrase: Central Traits." Arkiv för nordisk filologi 121 (2006): 193-236. [pdf]

The example has been converted into the Xigt format and enriched by INTENT. Not all annotations are shown; the original XML file is here. The example is visualized with the XigtViz IGT renderer. Interlinear annotations are shown in columns, and all annotations can be seen by hovering your mouse cursor over an item. The immediate target of annotation has a blue border, while ancestors are lightly shaded.

#Links

Projects involved in the creation and use of ODIN data include:

#Acknowledgments

Work on ODIN (and related projects) has been funded in part by the following grants: