Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan
Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Table of Contents (16 chapters)
Lucene 4 Cookbook
About the Authors
About the Reviewers

Obtaining a TokenStream

TokenStream is an intermediate data format between components within the analysis process. TokenStream acts as both an input and output format in all filters. For tokenizer, it consumes text from a reader and outputs result as TokenStream. Let's explore TokenStream in detail in this section.

Getting ready

The Analyzer class is an abstract base class containing two methods of interest. The first one is createComponents (String fieldname, Reader reader). This is where the analyzer is put together by chaining the tokenizer and filters. The second method is tokenStream (String fieldname, Reader reader). This is the method we will review in this section. We will use the tokenStream method to return a processed TokenStream so we can examine its content after the analysis process.

How to do it...

We need two arguments to call the tokenStream method. The first is a field name and the second is a reader:

Reader reader = new StringReader("Text to be passed");
Analyzer analyzer ...