EuroSense is a multilingual sense-annotated resource, automatically built via the joint disambiguation of the Europarl parallel corpus in 21 languages, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities, drawn from the multilingual sense inventory of BabelNet.
EuroSense's disambiguation pipeline couples a state-of-the-art graph-based multilingual disambiguation and entity linking system, Babelfy, with a language-independent vector representation of concepts and entities, Nasari. The pipeline is designed to exploit at best the cross-language complementarities of the parallel corpus, without relying on word alignments against a pivot language.
Claudio Delli Bovi, José Camacho Collados, Alessandro Raganato and Roberto Navigli.
EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. Proceedings of 55th annual meeting of the Association for Computational Linguistics (ACL 2017), pages 594–600, Vancouver, Canada, 30 July-4 August 2017.