Task #7: Coarse-grained English all-words

Introduction

One of the major obstacles to effective WSD is the fine granularity of the adopted computational lexicon. Specifically, WordNet, by large the most commonly used dictionary within the NLP community, encodes sense distinctions which are too subtle even for human annotators (Edmonds and Kilgariff, 1998). Nonetheless, many annotated resources, as well as the vast majority of disambiguation systems, rely on WordNet as a sense inventory: as a result, choosing a different sense inventory would make it hard to retrain supervised systems and would even pose copyright problems.

Following these observations, we are organizing a coarse-grained English all-words task for Semeval-2007. We tagged approximately 6,000 words of five running texts with coarse senses. Coarse senses are based on a clustering of the WordNet sense inventory obtained via a mapping to the Oxford Dictionary of English (ODE), a long-established dictionary which encodes coarse sense distinctions. The coarse-grained sense inventory is prepared semi-automatically: starting from an automatic clustering of senses produced by Navigli (2006), we manually validated the clustering for the words occurring in the text. Annotators tagged the texts with the coarse senses by using a special web interface. A judge will solve disputed cases (but we hope that, given the coarse nature of the task, there will be a very small number of such cases). For each content word, we provide the participants with its lemma and part of speech.

For disambiguation purposes, participating systems can exploit the knowledge of coarse distinctions as well as each fine-grained WordNet sense belonging to a sense cluster. Thus, supervised systems can be retrained on the usual data sets (e.g. SemCor) where a sense cluster replaces the fine-grained sense choice. Each system will provide a single coarse (and possibly fine) answer for each content word in the test set. We will provide a trial data set beforehand (as in the tradition of the previous Senseval exercises) and a test set.

Mailing list

We have set up a mailing list to facilitate discussion and information exchange about this task.
You can browse the e-mail discussion on the task.
To join enter your e-mail here:

Datasets and Formats

Trial and Test Set Format

The file input to systems for disambiguation (test set) will adhere to the following format:

<?xml version="1.0" encoding="iso-8859-1" ?>
<!DOCTYPE corpus SYSTEM "coarse-all-words.dtd">
<corpus lang="en">
<text id="d000">
<sentence id="d000.s001">
There
<instance id="d000.s001.t001" lemma="be" pos="v">was</instance>
a
<instance id="d000.s001.t002" lemma="steaming" pos="a">steaming</instance>
</sentence>
</text>
<text id="d002">
</text>
</corpus>
where each <text> tag specifies a text source whose identifier is provided by the id attribute. <sentence> tags represent single sentences within each text (again identified with an id attribute). Each <sentence> tag contains zero, one or more target words, each tagged with an <instance> element. Each instance specifies its unique identifier (id), lemma (lemma) and part of speech tag (pos). The latter can assume the values n, v, a and r for nouns, verbs, adjectives and adverbs, respectively. Instances are assumed to have an appropriate sense in the adopted sense inventory. Content words with no corresponding sense in the inventory will not be tagged.

Coarse-grained Sense Inventory Dataset and Format

The clustering was created automatically with the aid of a methodology described in (Navigli, 2006).
Sense clusters for words in the test set of the Semeval English coarse-grained all-words task were manually validated by an expert lexicographer. Sense clusters for all other words have not been validated, but can be still be used for improving the quality of disambiguation systems. For obvious reasons, monosemous words are not reported in the file 'sense_clusters-21.senses'. Words missing in the file could not be mapped according to the adopted automated procedure.

Due to the unavailability of a full mapping of WordNet senses from 2.0 to 2.1 and the difficulty in converting the resources used for building the clustering, some information went lost during the mapping process described in (Navigli, 2006). As a result, the quality of the automatic clustering might be slightly lower than expected (of course, this does not affect the manually validated sense clusters).
The adopted sense inventory for the English Coarse-grained All-Words task is a coarse-grained version of WordNet 2.1. Sense clusters are created based on the manual validation by expert lexicographers of automatically-acquired groupings of WordNet senses. Clusters are provided in a separate file, which contains a sense cluster on each line. For instance, noun spirit has 8 senses in WordNet, from which we created 3 groups of senses. As a result, the file will include three lines, one for each cluster:

spirit%1:18:01:: spirit%1:18:00::
spirit%1:26:00:: spirit%1:07:00:: spirit%1:26:01:: spirit%1:07:02:: spirit%1:07:03::
spirit%1:10:00::
WordNet senses in a cluster are represented in the WordNet sense key format and are separated by spaces.

Answer File Format

Systems can provide a single sense label for each instance in the test set. The format follows that of the previous SENSEVAL evaluation exercises. Systems can provide any sense in a cluster to assign the appropriate coarse sense. This will allow both systems tuned for fine-grained senses and systems exploiting the knowledge of sense groupings to participate in the evaluation exercise.

Five sample answer lines follow:

d001 d001.s001.t001 editorial%1:10:00::
d001 d001.s002.t004 principal%5:00:00:important:00
d003 d003.s056.t008 clamber%2:38:00::
d003 d003.s058.t002 gendarme%1:18:00::
d004 d004.s029.t003 painstakingly%4:02:00::

Evaluation

Evaluation will be performed in terms of standard precision, recall and F1 scores. We will avoid words with untagged senses, i.e. the "U" cases present in the Senseval-3 all-words test set.

Download area

Trial data is already available at the main Semeval website.
Coarse-grained sense inventory for WordNet 2.1.

All Data Now Available

Test set
Scorer and answers

Please use the following reference for this data:

R. Navigli, K. Litkowski, O. Hargraves. SemEval-2007 Task 07: Coarse-Grained English All-Words Task, Proc. of Semeval-2007 Workshop (SEMEVAL), in the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, June 23-24th, 2007.

Questions and Answers

Q: Are the test documents for this task the exact same test documents to be used in subtack #3 on English fine-grained all-words of task #17 of Semeval?
A: No, our test set is a larger dataset, which includes two additional documents.

Q: Can you confirm that all the test documents used in your task are Wall Street Journal texts?
A: No, the two additional documents are from different sources.

Q: Is it the case that the content words which need to be sense-tagged in the test documents in this task include words in all 4 POS tags noun, verb, adjective, and adverb?
A: Yes, it is.

Q: How will multiword expressions be represented in the test set?
A: An example follows:

<instance id="d000.s010.t010" lemma="turn_on" pos="v">turned on</instance>
the
<instance id="d000.s010.t011" lemma="aircraft_engine" pos="n">aircraft engine</instance>

In the example above, participants should provide a sense key for "turn on" and "aircraft engine".

Q: Can a word instance be tagged with multiple part-of-speech tags?
A: No, each word instance is tagged with one of the following tags: n (noun), a (adjective), v (verb), r (adverb).

System and Results

This section will be completed after the competition.

References

Edmonds P. and Kilgarriff A. 1998. Introduction to the special issue on 
      	evaluating word sense disambiguation systems. 
      	Journal of Natural Language Engineering, 8(4), Cambridge University.
Navigli R. 2006. Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance. 
      	Proc. of COLING-ACL 2006, Sydney, Australia, July 17-21, 2006.

 For more information, visit the SemEval-2007 home page.