============================================================================================================================================== SEW-EMBED: Language-Independent Concept Representations from a Semantically Enriched Wikipedia Claudio Delli Bovi and Alessandro Raganato ============================================================================================================================================== This package contains the two embedded augmentations of the vector representations constructed using SEW, the Semantically Enriched Wikipedia. The sense inventory of SEW-EMBED is BabelNet (http://babelnet.org), the largest multilingual encyclopedic dictionary and semantic network. Two versions of SEW-EMBED are available: - W2V, based on the Google News pre-trained Word2Vec embeddings* as external embedded representation; - NASARI, based on the NASARI embeddings for English** as external embedded representation. Both versions follow the same file format. For more information please refer to the reference paper. (*) https://code.google.com/archive/p/word2vec (**) http://lcl.uniroma1.it/nasari Please find below more details on the format: ============================================================================================================================================== FORMAT OF THE VECTOR REPRESENTATION FILES ============================================================================================================================================== Each vector representation file is in space-separated format, with a single vector in each line. The format is as follows: SYNSET VALUE_1 VALUE_2 ... VALUE_N where SYNSET is the BabelNet synset being represented by the vector, and VALUE_1 ... VALUE_N constitute the numerical components of the vector. Each vector has 400 dimensions (i.e. N=400) as both the external representations used are based on a 400-dimensional vector space. When a SYNSET is not covered by SEW-EMBED is represented by an all-zero vector. ============================================================================================================================================== REFERENCE PAPER ============================================================================================================================================== When using these data, please refer to the following paper: Claudio Delli Bovi and Alessandro Raganato. Sew-Embed at SemEval-2017 Task 2: Language-Independent Concept Representations from a Semantically Enriched Wikipedia. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 252--257, Vancouver, Canada, 30 July-4 August 2017. ============================================================================================================================================== CONTACT ============================================================================================================================================== For any enquiry related to SEW, please contact: - Claudio Delli Bovi (dellibovi [at] di.uniroma1 [dot] it) - Alessandro Raganato (raganato [at] di.uniroma1 [dot] it) ============================================================================================================================================== LICENSES ============================================================================================================================================== All vector representations constructed from SEW-EMBED are licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.