OntoLearn Reloaded

OntoLearn Reloaded is a graph-based approach to learning a lexical taxonomy automatically starting from a domain corpus and the Web. The system is based on Word-Class Lattices and a taxonomy learning algorithm developed by Roberto Navigli, Paola Velardi and Stefano Faralli

Dataset

We are releasing terminologies and taxonomies for the domains of: Artificial Intelligence, Finance, Animals, Plants, Vehicles, Viruses. The following terminologies are gold standard or automatically extracted from a domain corpus. The taxonomies are distributed as tab-separated values (tsv version), with the sequence (term,hypernym,gloss) when a gloss is available and (term,hypernym) otherwise. For each domain we release a tree-like taxonomy (TREE) and two directed acyclic graph (DAG_1_3,DAG_0_99) obtained with different parameters of our graph-based approach.

new We are realeasing the OWL/RDF version (OWL/RDF version) and the Lemon version (lemon version) of the ontologies.

Downloads

* The terminology for the Animals, Plants and Vehicles domains was kindly provided by Zornitsa Kozareva and Ed Hovy (note that terms are in their plural form).

Additional Downloads

We are also releasing the output of OntoLearn Reloaded on additional domain-specific corpora:

References

If you use our dataset in your own work or publish new work on the topic, please cite the following paper:

Paola Velardi, Stefano Faralli, Roberto Navigli. OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction. Computational Linguistics, 39(3), MIT Press, 2013.


Last update: 06 Oct 2013 by Stefano Faralli