Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Training a Naive Bayes classifier


Now that we can extract features from text, we can train a classifier. The easiest classifier to get started with is the NaiveBayesClassifier class. It uses the Bayes theorem to predict the probability that a given feature set belongs to a particular label. The formula is:

P(label | features) = P(label) * P(features | label) / P(features)

The following list describes the various parameters from the previous formula:

  • P(label): This is the prior probability of the label occurring, which is the likelihood that a random feature set will have the label. This is based on the number of training instances with the label compared to the total number of training instances. For example, if 60/100 training instances have the label, the prior probability of the label is 60%.

  • P(features | label): This is the prior probability of a given feature set being classified as that label. This is based on which features have occurred with each label in the training data.

  • P(features...