Similarly to the way in which we put together a lowercase and white space normalized tokenizer, we can use a filtered tokenizer to create a tokenizer that filters out stop words. Once again, using search engines as our example, we can remove commonly occurring words from our input set so as to normalize the text. The stop words that are typically removed convey very little information by themselves, although they might convey information in context.
The input is tokenized using whatever base tokenizer is set up, and then, the resulting tokens are filtered out by the stop tokenizer to produce a token stream that is free of the stop words specified when the stop tokenizer is initialized.
You will need to download the JAR file for the book and have Java and Eclipse set up so that you can run the example.