is one of three interrelated resources (the other two are TransGraph
and PanDictionary) that aggregate other lexical resources into a unified multilingual lexical resource.
The sources are dictionaries, glossaries, word lists, thesauri, vocabulary databases, WordNets
, subject-heading authorities, and other monolingual, bilingual, and multilingual collections of lexeme lemmata.
The unified resource is a database of meanings and expressions, with additional data from the sources. TransGraph
and PanDictionary add inference tools to the data to enable translations not attested by any of the sources.
The current size indicators are: 586 sources, 1,256 language varieties, 10.7 million expressions, 79 million attested translation pairs.
When the current processing backlog has been cleared, these are expected to reach about 1,500 sources, 1,500 language varieties, 30 million expressions, and 150 million translation pairs.
For a window on TransGraph
and PanDictionary, see PanImages
The problem I want your advice on is this: How should the 1,500 sources' schemata be integrated into a single schema? What data should be collected, what distinctions should be preserved, and what taxonomy should the data be organized into? I shall present the current answers to these questions and ask what changes would serve your research goals.
-- Main.jpool - 02 Oct 2008