Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Swapping verb phrases


Swapping the words around a verb can eliminate the passive voice from particular phrases. For example, the book was great can be transformed into the great book. This kind of normalization can also help with frequency analysis, by counting two apparently different phrases as the same phrase.

How to do it...

In transforms.py is a function called swap_verb_phrase(). It swaps the right-hand side of the chunk with the left-hand side, using the verb as the pivot point. It uses the first_chunk_index() function defined in the previous recipe to find the verb to pivot around.

def swap_verb_phrase(chunk):
  def vbpred(wt):
    word, tag = wt
    return tag != 'VBG' and tag.startswith('VB') and len(tag) > 2

  vbidx = first_chunk_index(chunk, vbpred)

  if vbidx is None:
    return chunk

  return chunk[vbidx+1:] + chunk[:vbidx]

Now we can see how it works on the part-of-speech tagged phrase the book was great:

>>> swap_verb_phrase([('the', 'DT'), ('book', 'NN'), ('was...