The goal of KLIUM is to learn when surface morphs (units of a word-segmentation) should really be counted together as the same underlying morpheme. In order to do this, we extend the Morfessor system (Creutz and Lagus, 2004, 2005) by adding analyses generated by applying orthographic rewrite rules so that the new statistical framework predicts both surface morphs and underlying morphemes. An initial segmentation produced by Morfessor Categories-MAP 0.9.2 is used as input. To suggest underlying morphemes, a small set of language-specific orthographic rules is currently needed. The proposed technique results in significant improvements over the baseline system. Our system outperforms state-of-the-art unsupervised systems, particularly for Turkish, where it provides an 18% F-score improvement over the top unsupervised entrant in the 2007 Morph Challenge Contest.
The work is based on Micahel Tepper's master's thesis, and was published at the IJCNLP-08 held in India last month.
Back to symposium main page