Evaluation data - Word Sense Disambiguation: A Unified Evaluation Framework

Evaluation data

Senseval-2 (Edmonds and Cotton, 2001). This dataset consists of 2283 sense annotations, including nouns, verbs, adverbs and adjectives.

Senseval-3 task 1 (Snyder and Palmer, 2004). This datasets is divided in three documents from three different domains (editorial, news story and fiction), totaling 1850 sense annotations.

SemEval-07 task 17 (Pradhan et al., 2007). This is the smallest among the five datasets, containing 455 sense annotations for nouns and verbs only.

SemEval-13 task 12 (Navigli et al., 2013). This dataset includes thirteen documents from various domains. In this case the original sense inventory was WordNet 3.0, which is the same that we use for all datasets. The number of sense annotations is 1644, although only nouns are considered.

SemEval-15 task 13 (Moro and Navigli, 2015). This is the most recent WSD dataset available to date. It consists of 1022 sense annotations in four documents coming from three heterogeneous domains: biomedical, mathematics/computing and social issues.

Additionally, we release the concatenations of all five above datasets as a single evaluation dataset ("ALL").

You can download all the evaluation datasets here [<1MB].

References:

- Philip Edmonds and Scott Cotton. 2001. Senseval-2: Overview. In Proceedings of The Second International Workshop on Evaluating Word Sense Disambiguation Systems, pages 1–6, Toulouse, France.

- Benjamin Snyder and Martha Palmer. 2004. The English all-words task. In Proceedings of the 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3), Barcelona, Spain, pages 41–43, Barcelona, Spain.

- Sameer Pradhan, Edward Loper, Dmitriy Dligach, and Martha Palmer. 2007. SemEval-2007 task-17: English lexical sample, SRL and all words. In Proceedings of SemEval, pages 87–92.

- Roberto Navigli, David Jurgens, and Daniele Vannella. 2013. SemEval-2013 Task 12: Multilingual Word Sense Disambiguation. In Proceedings of SemEval 2013, pages 222–231.

- Andrea Moro and Roberto Navigli. 2015. Semeval-2015 task 13: Multilingual all-words sense disambiguation and entity linking. Proceedings of SemEval-2015.