Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Converting tree labels


As you've seen in previous recipes, parse trees often have a variety of Tree label types that are not present in chunk trees. If you want to use parse trees to train a chunker, then you'll probably want to reduce this variety by converting some of these tree labels to more common label types.

Getting ready

First, we have to decide which Tree labels need to be converted. Let's take a look at that first Tree again:

Immediately, you can see that there are two alternative NP subtrees: NP-SBJ and NP-TMP. Let's convert both of those to NP. The mapping will be as follows:

Original Label

New Label

NP-SBJ

NP

NP-TMP

NP

How to do it...

In transforms.py is the function convert_tree_labels(). It takes two arguments: the Tree to convert and a label conversion mapping. It returns a new Tree with all matching labels replaced based on the values in the mapping:

from nltk.tree import Tree

def convert_tree_labels(tree, mapping):
  children = []

  for t in tree:
    if isinstance(t...