Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Language model classifier with tokens


Chapter 1, Simple Classifiers, covered classification without knowing what tokens/words were, with a language model per category—we used character slices or ngrams to model the text. Chapter 2, Finding and Working with Words, discussed at length the process of finding tokens in text, and now we can use them to build a classifier. Most of the time, we use tokenized input to classifiers, so this recipe is an important introduction to the concept.

How to do it...

This recipe will tell us how to train and use a tokenized language model classifier, but it will ignore issues such as evaluation, serialization, deserialization, and so on. You can refer to the recipes in Chapter 1, Simple Classifiers, for examples. This code of this recipe is in com.lingpipe.cookbook.chapter3.TrainAndRunTokenizedLMClassifier:

  1. The exception of the following code is the same as found in the Training your own language model classifier recipe in Chapter 1, Simple Classifiers. The DynamicLMClassifier...