Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Introduction


Chunk extraction, or partial parsing, is the process of extracting short phrases from a part-of-speech tagged sentence. This is different from full parsing in that we're interested in standalone chunks, or phrases, instead of full parse trees (for more on parse trees, see https://en.wikipedia.org/wiki/Parse_tree). The idea is that meaningful phrases can be extracted from a sentence by looking for particular patterns of part-of-speech tags.

As in Chapter 4, Part-of-speech Tagging, we'll be using the Penn Treebank corpus for basic training and testing chunk extraction. We'll also be using the CoNLL2000 corpus as it has a simpler and more flexible format that supports multiple chunk types (for more details on the conll2000 corpus and IOB tags, see the Creating a chunked phrase corpus recipe in Chapter 3, Creating Custom Corpora).