Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan
Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Table of Contents (16 chapters)
Lucene 4 Cookbook
About the Authors
About the Reviewers


Before we begin, let's review Lucene's analysis process. We learned about various components in creating and searching an index using IndexWriter and IndexSearcher in the previous chapter. We also looked at analyzer; how it's leveraged in tokenizing and cleansing data; and Lucene's internal index structure, the inverted index for high-performance lookup. We touched on Term and how it's used in querying.

A term is a fundamental unit of data in a Lucene index. It associates with a Document and itself has two attributes – field (analogous to column name in a table) and value. So how does Lucene extract terms from text? You may already be betting on an analyzer. It's correct that an analyzer is responsible for generating these terms. An analyzer is a container of tokenization and filtering processes. Tokenization, as discussed, is a process that breaks up text at word boundaries defined by a specific tokenizer component. After tokenization, filtering kicks in to massage data before...