Task #7: Coarse-grained English all-words
One of the major obstacles to effective WSD is the fine granularity of the adopted computational lexicon. Specifically, WordNet, by large the most commonly used dictionary within the NLP community, encodes sense distinctions which are too subtle even for human annotators (Edmonds and Kilgariff, 1998). Nonetheless, many annotated resources, as well as the vast majority of disambiguation systems, rely on WordNet as a sense inventory: as a result, choosing a different sense inventory would make it hard to retrain supervised systems and would even pose copyright problems.
Following these observations, we are organizing a coarse-grained English all-words task for Semeval-2007. We tagged approximately 6,000 words of five running texts with coarse senses. Coarse senses are based on a clustering of the WordNet sense inventory obtained via a mapping to the Oxford Dictionary of English (ODE), a long-established dictionary which encodes coarse sense distinctions. The coarse-grained sense inventory is prepared semi-automatically: starting from an automatic clustering of senses produced by Navigli (2006), we manually validated the clustering for the words occurring in the text. Annotators tagged the texts with the coarse senses by using a special web interface. A judge will solve disputed cases (but we hope that, given the coarse nature of the task, there will be a very small number of such cases). For each content word, we provide the participants with its lemma and part of speech.
For disambiguation purposes, participating systems can exploit the knowledge of coarse distinctions as well as each fine-grained WordNet sense belonging to a sense cluster. Thus, supervised systems can be retrained on the usual data sets (e.g. SemCor) where a sense cluster replaces the fine-grained sense choice. Each system will provide a single coarse (and possibly fine) answer for each content word in the test set. We will provide a trial data set beforehand (as in the tradition of the previous Senseval exercises) and a test set.
Datasets and Formats
Trial and Test Set Format
The file input to systems for disambiguation (test set) will adhere to the following format:
<?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE corpus SYSTEM "coarse-all-words.dtd"> <corpus lang="en"> <text id="d000"> <sentence id="d000.s001"> There <instance id="d000.s001.t001" lemma="be" pos="v">was</instance> a <instance id="d000.s001.t002" lemma="steaming" pos="a">steaming</instance> </sentence> </text> <text id="d002"> </text> </corpus>where each <text> tag specifies a text source whose identifier is provided by the id attribute. <sentence> tags represent single sentences within each text (again identified with an id attribute). Each <sentence> tag contains zero, one or more target words, each tagged with an <instance> element. Each instance specifies its unique identifier (id), lemma (lemma) and part of speech tag (pos). The latter can assume the values n, v, a and r for nouns, verbs, adjectives and adverbs, respectively. Instances are assumed to have an appropriate sense in the adopted sense inventory. Content words with no corresponding sense in the inventory will not be tagged.
Coarse-grained Sense Inventory Dataset and Format
The clustering was created automatically with the aid
of a methodology described in (Navigli, 2006).
spirit%1:18:01:: spirit%1:18:00:: spirit%1:26:00:: spirit%1:07:00:: spirit%1:26:01:: spirit%1:07:02:: spirit%1:07:03:: spirit%1:10:00::WordNet senses in a cluster are represented in the WordNet sense key format and are separated by spaces.
Answer File Format
Systems can provide a single sense label for each instance in the test set. The format follows that of the previous SENSEVAL evaluation exercises. Systems can provide any sense in a cluster to assign the appropriate coarse sense. This will allow both systems tuned for fine-grained senses and systems exploiting the knowledge of sense groupings to participate in the evaluation exercise.
Five sample answer lines follow:
d001 d001.s001.t001 editorial%1:10:00:: d001 d001.s002.t004 principal%5:00:00:important:00 d003 d003.s056.t008 clamber%2:38:00:: d003 d003.s058.t002 gendarme%1:18:00:: d004 d004.s029.t003 painstakingly%4:02:00::
EvaluationEvaluation will be performed in terms of standard precision, recall and F1 scores. We will avoid words with untagged senses, i.e. the "U" cases present in the Senseval-3 all-words test set.
All Data Now Available
Questions and Answers
Q: Are the test documents for this task the exact same test documents to be used in subtack #3 on English fine-grained all-words of task #17 of Semeval?
Q: Can you confirm that all the test documents used in your task are Wall Street Journal texts?
Q: Is it the case that the content words which need to be sense-tagged in the test documents in this task include words in all 4 POS tags noun, verb, adjective, and adverb?
Q: How will multiword expressions be represented in the test set?
<instance id="d000.s010.t010" lemma="turn_on" pos="v">turned on</instance> the <instance id="d000.s010.t011" lemma="aircraft_engine" pos="n">aircraft engine</instance>
In the example above, participants should provide a sense key for "turn on" and "aircraft engine".
Q: Can a word instance be tagged with multiple part-of-speech tags?
System and Results
This section will be completed after the competition.
Edmonds P. and Kilgarriff A. 1998. Introduction to the special issue on evaluating word sense disambiguation systems. Journal of Natural Language Engineering, 8(4), Cambridge University.
Navigli R. 2006. Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance. Proc. of COLING-ACL 2006, Sydney, Australia, July 17-21, 2006.
For more information, visit the SemEval-2007 home page.