Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

We present SW2V (Senses and Words to Vectors), a new model which simultaneously learns embeddings for both words and senses as an emerging feature by exploiting knowledge from both text corpora and semantic networks in a joint training phase. Word and sense embeddings are therefore represented in the same vector space.

Data and Code

Currently available files for download:

Code for obtaining word and sense embeddings from any pre-processed corpus by applying SW2V, including a README file.

Wikipedia corpus (dump of November 2014) preprocessed using BabelNet to directly use the SW2V code. Download: [6GB]

300-dimensional pre-trained word and sense embeddings trained on:
- Wikipedia. Download: [7GB]
- UMBC webbase corpus. Download: [3GB]
300-dimensional pre-trained word and synset (entity) embeddings trained on:
- Wikipedia (using hyperlinks as only source of annotated data). Download: [6.5GB]
- UMBC webbase corpus. Download: [2.5GB]

Reference paper

When using these resources, please refer to the following paper:

Massimiliano Mancini, Jose Camacho-Collados, Ignacio Iacobacci and Roberto Navigli.
Embedding Words and Senses Together via Joint Knowledge-Enhanced Training.
In Proceedings of CoNLL, Vancouver, Canada, 2017.

Contact

Should you have any enquiries about any of the resources, please contact Massimiliano Mancini (mancini [at] dis.uniroma1 [dot] it), Jose Camacho Collados (collados [at] di.uniroma1 [dot] it), Ignacio Iacobacci (iacobacci [at] di.uniroma1 [dot] it) or Roberto Navigli (navigli [at] di.uniroma1 [dot] it).

Last update: 5 July 2017