EuroSense

EuroSense

Multilingual Sense Annotations for Europarl

Powered by BabelNet BabelNet

About

EuroSense is a multilingual sense-annotated resource, automatically built via the joint disambiguation of the Europarl parallel corpus in 21 languages, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities, drawn from the multilingual sense inventory of BabelNet.
EuroSense's disambiguation pipeline couples a state-of-the-art graph-based multilingual disambiguation and entity linking system, Babelfy, with a language-independent vector representation of concepts and entities, Nasari. The pipeline is designed to exploit at best the cross-language complementarities of the parallel corpus, without relying on word alignments against a pivot language.

Reference Paper

Claudio Delli Bovi, José Camacho Collados, Alessandro Raganato and Roberto Navigli.
EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. Proceedings of 55th annual meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 30 July-4 August 2017.


Contacts

Claudio Delli Bovi

Claudio Delli Bovi
dellibovi [at] di.uniroma1 [dot] it
bn:17381128n @ BabelNetbn:17381128n

José Camacho Collados
collados [at] di.uniroma1 [dot] it
bn:17381131nbn:17381131n @ BabelNet

José Camacho Collados

Alessandro Raganato

Alessandro Raganato
raganato [at] di.uniroma1 [dot] it
bn:17381127 @ BabelNetbn:17381127n

Roberto Navigli
navigli [at] di.uniroma1 [dot] it
bn:09353187nbn:09353187n @ BabelNet

Roberto Navigli

Download

EuroSense - High-coverage [ tar.gz: 4.9 GB ]

EuroSense - High-precision [ tar.gz: 3.7 GB ]

README

Stats/Languages

Updates