Word-Class Lattices

Word-Class Lattices (WCL) are a generalization of word lattices developed by Roberto Navigli and Paola Velardi to model textual definitions. Our classifiers, based on two variants of WCLs, are able to identify definitions and extract hypernyms with high accuracy.

Datasets

We are releasing a package that contains two folders: The wikipedia folder contains the positive (wiki_good.txt) and negative (wiki_bad.txt) definition candidates extracted from Wikipedia. The ukwac folder contains candidate definitions for over 300,000 sentences from the ukWaC Web corpus (ukwac_testset.txt) in which occur any of 239 domain terms selected from the terminology of four different domains (ukwac_terms.txt). To estimate recall, we manually checked 50,000 of these sentences and identified 99 definitional sentences (ukwac_estimated_recall.txt).

Downloads

References

When citing the Word-Class Lattice algorithm and our experimental results, please refer to the following paper:

Roberto Navigli, Paola Velardi. Learning Word-Class Lattices for Definition and Hypernym Extraction. Proc. of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, July 11-16, 2010, pp. 1318-1327

When referring to the dataset only, please cite the following paper:

Roberto Navigli, Paola Velardi, Juana María Ruiz-Martínez. An Annotated Dataset for Extracting Definitions and Hypernyms from the Web. Proc. of LREC 2010, Valletta, Malta, May 19-21, 2010, pp. 3716-3722


Last update: 2 August 2010 by Roberto Navigli