The authors leverage the word2vec skipgram model and WordNet glosses (i.e. word sense definitions) for word sense disambiguation. This is achieved as follows:
- A skipgram model is trained.
- For each sense of a word according to WordNet, a vector is derived by taking the average of the content words in the WordNet definition (“gloss”) of that sense (“gloss vectors”)
- The gloss vectors are used to identify the sense of a word occurrence by considering its dot product with the context of that occurrence. The sense whose gloss vector has the highest dot product with the context vector is chosen, as long as it is wins by a sufficient margin.
The authors are then able to train word sense vectors (distinct from the gloss vectors) by modifying the skip-gram objective. These word sense vectors are then used for similarity tasks and not for word sense disambiguation. It seems to me that it would have been simpler to annotate word occurrences in the corpus with the senses than to modify the objective.
Evaluation is performed for coarse-grained WSD (i.e. disambiguating homographs).
Independence assumptions in iterative word sense disambiguation
The authors disambiguate the senses of a words one word at a time, based upon the disambiguation that has already taken place. Two different strategies are considered for choosing the order in which to disambiguate the words in a context. These strategic approaches make a problematic independence assumption – that the sense of the word to be disambiguated is independent of the senses of the words not yet disambiguated. I haven’t read many WSD papers – I suspect these independence assumptions aren’t particular to the approach of the authors.