In this chapter, we cover the following recipes:
Introduction to tokenizer factories – finding words in a character stream
Combining tokenizers – lowercase tokenizer
Combining tokenizers – stop word tokenizers
Using Lucene/Solr tokenizers
Using Lucene/Solr tokenizers with LingPipe
Evaluating tokenizers with unit tests
Modifying tokenizer factories
Finding words for languages without white spaces