Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
About the Author
About the Reviewers
Penn Treebank Part-of-speech Tags


Chunk extraction, or partial parsing, is the process of extracting short phrases from a part-of-speech tagged sentence. This is different from full parsing in that we're interested in standalone chunks, or phrases, instead of full parse trees (for more on parse trees, see The idea is that meaningful phrases can be extracted from a sentence by looking for particular patterns of part-of-speech tags.

As in Chapter 4, Part-of-speech Tagging, we'll be using the Penn Treebank corpus for basic training and testing chunk extraction. We'll also be using the CoNLL2000 corpus as it has a simpler and more flexible format that supports multiple chunk types (for more details on the conll2000 corpus and IOB tags, see the Creating a chunked phrase corpus recipe in Chapter 3, Creating Custom Corpora).