Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Flattening a deep tree


Some of the included corpora contain parsed sentences, which are often deep trees of nested phrases. Unfortunately, these trees are too deep to use for training a chunker, since IOB tag parsing is not designed for nested chunks. To make these trees usable for chunker training, we must flatten them.

Getting ready

We're going to use the first parsed sentence of the treebank corpus as our example. Here's a diagram showing how deeply nested this tree is:

You may notice that the part-of-speech tags are part of the tree structure instead of being included with the word. This will be handled later using the Tree.pos() method, which was designed specifically for combining words with preterminal Tree labels such as part-of-speech tags.

How to do it...

In transforms.py is a function named flatten_deeptree(). It takes a single Tree and will return a new Tree that keeps only the lowest-level trees. It uses a helper function, flatten_childtrees(), to do most of the work:

from nltk.tree...