We strongly believe that scientific advances can be made by sharing common resources and standardized datasets with the research community. On this page you can find many of the resources that were developed as part of our research work at LCL.

BabelNet


A very large multilingual semantic network with millions of concepts obtained from an integration of WordNet and Wikipedia and translations from Wikipedia's cross-language links and a state-of-the-art machine translation system.
________
Website

MORESQUE


A dataset for the evaluation of subtopic information retrieval. The dataset contains 114 topics (i.e., queries): each topic is further categorized into subtopics and contains 100 top-ranking documents.
________
Website

TaxoLearn


A graph-based approach aimed at learning a lexical taxonomy automatically starting from a domain corpus and the Web.
________
Website

WordNet++


An extension of WordNet comprising millions of new semantic pointers between WordNet synsets harvested from Wikipedia co-occurring links via BabelNet's mapping from Wikipedia pages to WordNet synsets.
________
Website

Word-Class Lattices (WCL)


A generalization of word lattices to model textual definitions. Our classifiers, based on two variants of WCLs, are able to identify definitions and extract hypernyms with high accuracy.
________
Website

WikiTax2WordNet


A dataset of mappings from Wikipedia categories to WordNet synsets that were automatically generated from WikiTaxonomy.
________
Website

Coarse-grained English all-words


Datasets and resources for the Semeval-2007 task #7 on coarse-grained all-words WSD for English.
________
Website

English Lexical Substitution


Datasets and resources for the Semeval-2007 task #10 on English lexical substitution.
________
Website