PanLex is one of three interrelated resources (the other two are TransGraph and PanDictionary) that aggregate other lexical resources into a unified multilingual lexical resource.

The sources are dictionaries, glossaries, word lists, thesauri, vocabulary databases, WordNets, subject-heading authorities, and other monolingual, bilingual, and multilingual collections of lexeme lemmata.

The unified resource is a database of meanings and expressions, with additional data from the sources. TransGraph and PanDictionary add inference tools to the data to enable translations not attested by any of the sources.

The current size indicators are: 586 sources, 1,256 language varieties, 10.7 million expressions, 79 million attested translation pairs.

When the current processing backlog has been cleared, these are expected to reach about 1,500 sources, 1,500 language varieties, 30 million expressions, and 150 million translation pairs.

For a window on TransGraph and PanDictionary, see PanImages.

The problem I want your advice on is this: How should the 1,500 sources' schemata be integrated into a single schema? What data should be collected, what distinctions should be preserved, and what taxonomy should the data be organized into? I shall present the current answers to these questions and ask what changes would serve your research goals.

Slides from this talk, including summary of audience comments
