Supervised systems

  • IMS (Zhong and Ng, 2010) uses a Support Vector Machine (SVM) classifier over a set of conventional WSD features. IMS is built on a flexible framework which allows an easy integration of different features. The default implementation includes surrounding words, PoS tags of surroundings words, and local collocations as features.

  • IMS+embeddings. Iacobacci et al. (2016) carried out a comparison of different strategies for integrating word embeddings as a feature in WSD. In this empirical comparison we consider the two best configurations in Iacobacci et al. (2016): using all IMS default features including and excluding surrounding words (IMS+emb and IMS-s+emb, respectively). In both cases word embeddings are integrated using exponential decay (i.e., word weights drop exponentially as the distance towards the target word increases). Likewise, we use Iacobacci et al.’s suggested learning strategy and hyperparameters to train the word embeddings: Skip-gram model of Word2Vec 10 (Mikolov et al., 2013) with 400 dimensions, ten negative samples and a window size of ten words. As unlabeled corpus to train the word embeddings we use the English ukWaC corpus (Baroni et al., 2009), which is composed by two billion words from paragraphs extracted from the web. The word embeddings used for this system can be downloaded here [2.6GB].

  • Context2Vec (Melamud et al., 2016). Neural language models have recently shown their potential for the WSD task. In this empirical comparison we replicated the approach of Melamud et al. (2016, Context2Vec), for which the code is publicly available. This approach is divided in three steps. First, a bidirectional LSTM recurrent neural network is trained on an unlabeled corpus (we considered the same ukWaC corpus used by the previous comparison system). Then, a context vector is learned for each sense annotation in the training corpus. Finally, the sense annotation whose context vector is closer to the target word’s context vector is selected as the intended sense.

Knowledge-based systems

  • Lesk (Lesk, 1986) is a simple knowledge-based WSD algorithm that bases its calculations on the overlap between the definitions of a given sense and the context of the target word. For our experiments we replicated the extended version of the original algorithm in which definitions of related senses are also considered and the conventional term frequency-inverse document frequency (tf-idf) is used for word weighting (Banerjee and Pedersen, 2003, Lesk ext). Additionally, we include the enhanced version of Lesk in which word embeddings are leveraged to compute the similarity between definitions and the target context (Basile et al., 2014, Lesk ext+emb).

  • UKB (Agirre et al., 2014) is a graph-based WSD system which makes use of random walks over a semantic network (WordNet graph in this case). UKB applies the Personalized Page Rank algorithm initialized using the context of the target word. Unlike most WSD systems, UKB does not back-off to the WordNet first sense heuristic and it is self-contained (i.e., it does not make use of any external resources/corpora). We use both default configurations from UKB: using the full WordNet graph (UKB) and the full graph including disambiguated glosses as connections as well (UKB-g).
    New: The authors of UKB have notified us of a recent update of UKB which improves their results by using the following configuration (UKB-g*): it uses sense distributions from SemCor,  takes a context window of a minimum of 20 words instead of only a sentence and runs Personalized PageRank for each word.

  • Babelfy (Moro et al., 2014) is another graph-based disambiguation approach which exploits random walks. Specifically, Babelfy uses random walks with restart over BabelNet, a large semantic network integrating WordNet among other resources such as Wikipedia or Wiktionary. Its algorithm is based on a densest subgraph heuristic for selecting high-coherence semantic interpretations of the input text. The best configuration of Babelfy takes into account not only the target sentence in which the target word occurs, but also the whole document.

For more information about these systems and the state of the art on Word Sense Disambiguation, please read our reference paper.


- Zhi Zhong and Hwee Tou Ng. 2010. It Makes Sense: A wide-coverage Word Sense Disambiguation system for free text. In Proceedings of the ACL System Demonstrations, pages 78–83.

- Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. Embeddings for word sense disambiguation: An evaluation study. In Proceedings of ACL, pages 897–907, Berlin, Germany.

- Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional lstm. In Proceedings of CONLL.

- Michael Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual Conference on Systems Documentation, Toronto, Ontario, Canada, pages 24–26.

- Satanjeev Banerjee and Ted Pedersen. 2003. Extended gloss overlap as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 805–810, Acapulco, Mexico.

- Pierpaolo Basile, Annalina Caputo, and Giovanni Semeraro. 2014. An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model. In Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics: Technical Papers, pages 1591–1600, Dublin, Ireland.

- Eneko Agirre, Oier Lopez de Lacalle, and Aitor Soroa. 2014. Random walks for knowledge-based word sense disambiguation. Computational Linguistics, 40(1):57–84.

- Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2:231–244.