Word-Class Lattices (WCLs)
Word-Class Lattices (WCLs) are a generalization of word lattices developed by Roberto Navigli and Paola Velardi to model textual definitions. Our classifiers, based on two variants of WCLs, are able to identify definitions and extract hypernyms with high accuracy.
WCL Java API
We release here our implementation of Word-Class Lattices, available as a Java API download. The WCL classifier can easily be used programmatically in any Java project.
In the code snippet below we show an example of API usage. After the selection of the target language (line 15), we load the training dataset for the target language (line 23). Then an instance of WCLClassifier is created (line 25) and the training phase is launched on the input training corpus (line 26). Now the classifier is ready to be tested on any given sentence in the target language (lines 28-32). If the classifier output is positive we can print the extracted hypernym (line 34). The output of the presented code is the string "classifier" which corresponds to the hypernym extracted by WCL for the input sentence "WCL is a classifier".
public class Test
public static void main(String args)
// select the language of interest
Language targetLanguage = Language.EN;
String trainingDatasetFile ="data/training/wiki_good.EN.html";
// open the training set
// load the training set for the target language
ts = new AnnotatedDataset(trainingDatasetFile, targetLanguage);
// obtain an instance of the WCL classifier
WCLClassifier c = new TripleLatticeClassifier(targetLanguage);
// create a sentence to be tested
Sentence sentence = Sentence.createFromString("WCL",
"WCL is a classifier.",
// test the sentence
SentenceAnnotation sa = c.test(sentence);
// print the hypernym
if (sa.isDefinition()) System.out.println(sa.getHyper());
catch (IOException e)
Manually annotated English training dataset:
The above manually annotated WCL datasets are described in , with
some linguistic analysis, and used in  to perform an experimental evaluation of WCLs.
We are releasing a package that contains two folders: wikipedia, ukwac.
The wikipedia folder contains the positive (wiki_good.txt) and negative
(wiki_bad.txt) definition candidates extracted from Wikipedia.
The ukwac folder contains candidate definitions for over 300,000 sentences
from the ukWaC Web corpus (ukwac_testset.txt) in which occur any of 239
domain terms selected from the terminology of four different domains
(ukwac_terms.txt). To estimate recall, we manually checked 50,000 of
these sentences and identified 99 definitional sentences
Automatically annotated training datasets from Wikipedia:
The above automatically annotated training datasets were obtained from Wikipedia for three languages: English, French and Italian.
The dataset creation procedure is described in .
1 When citing the Word-Class Lattice algorithm and our experimental results, please refer to the following paper:
2 When referring to the manually-created dataset only, please cite the following paper:
3 When referring to the automatically-created dataset and or the WCL API, please cite the following paper:
Last update: 29 July 2013 by Stefano Faralli