Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Thresholding classifiers


Logistic regression classifiers are often deployed with a threshold rather than the provided classifier.bestCategory() method. This method picks the category with the highest conditional probability, which, in a 3-way classifier, can be just above one-third. This recipe will show you how to adjust classifier performance by explicitly controlling how the best category is determined.

This recipe will consider the 3-way case with the p, n, and o labels and work with the classifier produced by the Classifier-building life cycle recipe earlier in this chapter. The cross-validation evaluation produced is:

Category p
Recall: 0.64
Prec  : 0.57
Category n
Recall: 0.41
Prec  : 0.54
Category o
Recall: 0.81
Prec  : 0.81

We will run novel data to set thresholds.

How to do it...

Our business use case is that recall be maximized while p has .65 precision and n has .5 precision for reasons discussed in the Classifier-building life cycle recipe. The o category is not important in this...