============================================================================================================================================== SEW: Automatic Construction and Evaluation of a Large Semantically Enriched Wikipedia Alessandro Raganato, Claudio Delli Bovi and Roberto Navigli ============================================================================================================================================== This package contains the two vector representations constructed using SEW, a sense-annotated corpus automatically built from Wikipedia. The sense inventory of SEW is BabelNet (http://babelnet.org), the largest multilingual encyclopedic dictionary and semantic network. Two types of vector representation are available: - WB-SEW: a vector representation for BabelNet synsets in which dimensions are Wikipedia pages; - SB-SEW: a vector representation for Wikipedia pages in which dimensions are BabelNet synsets. Both vector representations (WB-SEW and SB-SEW) are available in two different versions: - One where frequencies are estimated using raw counts (*.rc.tsv); - One where frequencies are estimated using lexical specificity (*.ls.tsv). For more information please refer to Section 6.3 (Extrinsic Evaluation: Semantic Similarity) of the reference paper. Please find below more details on the format: ============================================================================================================================================== FORMAT OF THE VECTOR REPRESENTATION FILES ============================================================================================================================================== Each vector representation file is in tab-separated (.tsv) format, with a single vector in each line. The format is as follows: ENTITY \t \t COMPONENT:VALUE \t COMPONENT:VALUE \t ... \t COMPONENT:VALUE where ENTITY is either the BabelNet synset (WB-SEW) or the Wikipedia page (SB-SEW) being represented by the vector, and the COMPONENT:VALUE pairs constitute the non-zero dimensions of the vector. Each COMPONENT is either a Wikipedia page (WB-SEW) or a BabelNet synset (SB-SEW). ============================================================================================================================================== REFERENCE PAPER ============================================================================================================================================== When using these data, please refer to the following paper: Alessandro Raganato, Claudio Delli Bovi and Roberto Navigli. Automatic Construction and Evaluation of a Large Semantically Enriched Wikipedia. Proceedings of 25th International Joint Conference on Artificial Intelligence (IJCAI-16), New York City, New York, USA, 9-15 July 2016. ============================================================================================================================================== CONTACT ============================================================================================================================================== For any enquiry related to SEW, please contact: - Alessandro Raganato (raganato [at] di.uniroma1 [dot] it) - Claudio Delli Bovi (dellibovi [at] di.uniroma1 [dot] it) - Roberto Navigli (navigli [at] di.uniroma1 [dot] it) ============================================================================================================================================== LICENSES ============================================================================================================================================== All vector representations constructed from SEW are licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.