Lemmatization is very similar to stemming, but is more akin to synonym replacement. A lemma is a root word, as opposed to the root stem. So unlike stemming, you are always left with a valid word that means the same thing. However, the word you end up with can be completely different. A few examples will explain this.
Make sure that you have unzipped the wordnet
corpus in nltk_data/corpora/wordnet
. This will allow the WordNetLemmatizer
class to access WordNet. You should also be familiar with the part-of-speech tags covered in the Looking up Synsets for a word in WordNet recipe of Chapter 1, Tokenizing Text and WordNet Basics.