Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Singularizing plural nouns


As we saw in the previous recipe, the transformation process can result in phrases such as recipes book. This is a NNS followed by a NN, when a more proper version of the phrase would be recipe book, which is a NN followed by another NN. We can do another transform to correct these improper plural nouns.

How to do it...

The transforms.py script defines a function called singularize_plural_noun() which will depluralize a plural noun (tagged with NNS) that is followed by another noun:

def singularize_plural_noun(chunk):
  nnsidx = first_chunk_index(chunk, tag_equals('NNS'))

  if nnsidx is not None and nnsidx+1 < len(chunk) and chunk[nnsidx+1][1][:2] == 'NN':
    noun, nnstag = chunk[nnsidx]
    chunk[nnsidx] = (noun.rstrip('s'), nnstag.rstrip('S'))

  return chunk

And using it on recipes book, we get the more correct form, recipe book.

>>> singularize_plural_noun([('recipes', 'NNS'), ('book', 'NN')])
[('recipe', 'NN'), ('book', 'NN')]

How it works...

We start...