Python 3 Text Processing with NLTK 3 Cookbook

NLTK comes with a simple interface to look up words in WordNet. What you get is a list of Synset instances, which are groupings of synonymous words that express the same concept. Many words have only one Synset, but some have several. In this recipe, we'll explore a single Synset, and in the next recipe, we'll look at several in more detail.

Getting ready

Be sure you've unzipped the wordnet corpus at nltk_data/corpora/wordnet. This will allow WordNetCorpusReader to access it.

How to do it...

Now we're going to look up the Synset for cookbook, and explore some of the properties and methods of a Synset using the following code:

>>> from nltk.corpus import wordnet
>>> syn = wordnet.synsets('cookbook')[0]
>>> syn.name()
'cookbook.n.01'
>>> syn.definition()
'a book of recipes and cooking directions'

How it works...

You can look up any word in WordNet using wordnet.synsets(word) to get a list of Synsets. The list may be empty if the word is not found. The list may also have quite a few elements, as some words can have many possible meanings, and, therefore, many Synsets.

There's more...

Each Synset in the list has a number of methods you can use to learn more about it. The name() method will give you a unique name for the Synset, which you can use to get the Synset directly:

>>> wordnet.synset('cookbook.n.01')
Synset('cookbook.n.01')

The definition() method should be self-explanatory. Some Synsets also have an examples() method, which contains a list of phrases that use the word in context:

>>> wordnet.synsets('cooking')[0].examples()
['cooking can be a great art', 'people are needed who have experience in cookery', 'he left the preparation of meals to his wife']

Working with hypernyms

Synsets are organized in a structure similar to that of an inheritance tree. More abstract terms are known as hypernyms and more specific terms are hyponyms. This tree can be traced all the way up to a root hypernym.

Hypernyms provide a way to categorize and group words based on their similarity to each other. The Calculating WordNet Synset similarity recipe details the functions used to calculate the similarity based on the distance between two words in the hypernym tree:

>>> syn.hypernyms()
[Synset('reference_book.n.01')]
>>> syn.hypernyms()[0].hyponyms()
[Synset('annual.n.02'), Synset('atlas.n.02'), Synset('cookbook.n.01'), Synset('directory.n.01'), Synset('encyclopedia.n.01'), Synset('handbook.n.01'), Synset('instruction_book.n.01'), Synset('source_book.n.01'), Synset('wordbook.n.01')]
>>> syn.root_hypernyms()
[Synset('entity.n.01')]

As you can see, reference_book is a hypernym of cookbook, but cookbook is only one of the many hyponyms of reference_book. And all these types of books have the same root hypernym, which is entity, one of the most abstract terms in the English language. You can trace the entire path from entity down to cookbook using the hypernym_paths() method, as follows:

>>> syn.hypernym_paths()
[[Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('object.n.01'), Synset('whole.n.02'), Synset('artifact.n.01'), Synset('creation.n.02'), Synset('product.n.02'), Synset('work.n.02'), Synset('publication.n.01'), Synset('book.n.01'), Synset('reference_book.n.01'), Synset('cookbook.n.01')]]

The hypernym_paths() method returns a list of lists, where each list starts at the root hypernym and ends with the original Synset. Most of the time, you'll only get one nested list of Synsets.

Part of speech (POS)

You can also look up a simplified part-of-speech tag as follows:

>>> syn.pos()
'n'

There are four common part-of-speech tags (or POS tags) found in WordNet, as shown in the following table:

Part of speech	Tag
Noun	n
Adjective	a
Adverb	r
Verb	v

These POS tags can be used to look up specific Synsets for a word. For example, the word 'great' can be used as a noun or an adjective. In WordNet, 'great' has 1 noun Synset and 6 adjective Synsets, as shown in the following code:

>>> len(wordnet.synsets('great'))
7
>>> len(wordnet.synsets('great', pos='n'))
1
>>> len(wordnet.synsets('great', pos='a'))
6

These POS tags will be referenced more in the Using WordNet for tagging recipe in Chapter 4, Part-of-speech Tagging.

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Related Content you might be interested in

Current Title:

Python 3 Text Processing with NLTK 3 Cookbook

Looking up Synsets for a word in WordNet

Getting ready

How to do it...

How it works...

There's more...

Working with hypernyms

Part of speech (POS)

See also