Find the word that does not belong:
A Framework for an Intrinsic Evaluation of Word Vector RepresentationsTest your own word embeddings on the outlier detection task!
Given a group of words, the goal of the outlier detection task is to identify the word that does not belong in the group.
For example, book would be an outlier for the set of words apple, banana, lemon, book, orange, as it is not a fruit like the others.
This task is intended to test interesting properties of word embeddings not fully addressed to date in common intrinsic evaluation benchmarks such as word similarity.
Although the task is quite well-defined and humans achieve a near-perfect performance, this task is still challenging for state-of-the-art word embeddings.
In fact, some of the shortcomings of current word embeddings are clearly highlighted as part of the evaluation.
Please find more information about the dataset and the outlier detection task in the reference paper.
Download
Download the whole package [<1MB] including the following files:
José Camacho-Collados and Roberto Navigli.
Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations.
In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany, August 12, 2016. |
@inproceedings{camacho2016find,
title={Find the word that does not belong: A framework for an intrinsic evaluation of word vector representations},
author={Camacho-Collados, Jos{\'e} and Navigli, Roberto},
booktitle={Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP},
pages={43--50},
year={2016}
}