Chapter 1, Simple Classifiers, covered classification without knowing what tokens/words were, with a language model per category—we used character slices or ngrams to model the text. Chapter 2, Finding and Working with Words, discussed at length the process of finding tokens in text, and now we can use them to build a classifier. Most of the time, we use tokenized input to classifiers, so this recipe is an important introduction to the concept.
This recipe will tell us how to train and use a tokenized language model classifier, but it will ignore issues such as evaluation, serialization, deserialization, and so on. You can refer to the recipes in Chapter 1, Simple Classifiers, for examples. This code of this recipe is in com.lingpipe.cookbook.chapter3.TrainAndRunTokenizedLMClassifier
:
The exception of the following code is the same as found in the Training your own language model classifier recipe in Chapter 1, Simple Classifiers. The
DynamicLMClassifier...