Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Merging and splitting chunks with regular expressions


In this recipe, we'll cover two more rules for chunking. A MergeRule class can merge two chunks together based on the end of the first chunk and the beginning of the second chunk. A SplitRule class will split a chunk into two chunks based on the specified split pattern.

How to do it...

A SplitRule class is specified with two opposing curly braces surrounded by a pattern on either side. To split a chunk after a noun, you would do <NN.*>}{<.*>. A MergeRule class is specified by flipping the curly braces, and will join chunks where the end of the first chunk matches the left pattern and the beginning of the next chunk matches the right pattern. To merge two chunks where the first ends with a noun and the second begins with a noun, you'd use <NN.*>{}<NN.*>.

Note

Note that the order of rules is very important, and reordering can affect the results. The RegexpParser class applies the rules one at a time from top to bottom...