Embedding Words and Senses Together via Joint Knowledge-Enhanced Training
We present SW2V (Senses and Words to Vectors), a new model which simultaneously learns embeddings for both words and senses
as an emerging feature by exploiting knowledge from both text corpora and semantic networks in a joint training phase.
Word and sense embeddings are therefore represented in the same vector space.
Data and Code
Currently available files for download:
-
Code for obtaining word and sense embeddings from any pre-processed corpus by applying SW2V, including a README file.
-
Wikipedia corpus (dump of November 2014) preprocessed using BabelNet to directly use the SW2V code. Download:
[6GB]
-
300-dimensional pre-trained word and sense embeddings trained on:
-
300-dimensional pre-trained word and synset (entity) embeddings trained on:
-
Wikipedia (using hyperlinks as only source of annotated data). Download:
[6.5GB]
-
UMBC webbase corpus. Download:
[2.5GB]
Reference paper
When using these resources, please refer to the following paper:
Contact
Should you have any enquiries about any of the resources, please contact Massimiliano Mancini (mancini [at] dis.uniroma1 [dot] it), Jose Camacho Collados (collados [at] di.uniroma1 [dot] it), Ignacio Iacobacci (iacobacci [at] di.uniroma1 [dot] it) or Roberto Navigli (navigli [at] di.uniroma1 [dot] it).
Last update: 5 July 2017