Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Training a classifier with NLTK-Trainer


In this recipe, we'll cover the train_classifier.py script from NLTK-Trainer, which lets you train NLTK classifiers from the command line. NLTK-Trainer was previously introduced at the end of Chapter 4, Part-of-speech Tagging, and again at the end of Chapter 5, Extracting Chunks.

Note

You can find NLTK-Trainer at https://github.com/japerk/nltk-trainer and the online documentation at http://nltk-trainer.readthedocs.org/.

How to do it...

Like train_tagger.py and train_chunker.py, the only required argument for train_classifier.py is the name of a corpus. The corpus must have a categories() method, because text classification is all about learning to classify categories. Here's an example of running train_classifier.py on the movie_reviews corpus:

$ python train_classifier.py movie_reviews
loading movie_reviews
2 labels: ['neg', 'pos']
using bag of words feature extraction
2000 training feats, 2000 testing feats
training NaiveBayes classifier
accuracy: 0...