Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan
Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Table of Contents (16 chapters)
Lucene 4 Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Obtaining a common analyzer


Lucene provides a set of default analyzers in the lucene-analyzers-common package. Let's take a look at them in detail.

Getting ready

The following are five common analyzers Lucene provides in the lucene-analyzers-common module:

  • WhitespaceAnalyzer: Splits text at whitespaces, just as the name indicates. In fact, this is the only thing this analyzer does.

  • SimpleAnalyzer: Splits text at non-letter characters and lowercases resulting tokens.

  • StopAnalyzer: Splits text at non-letter characters, lowercases resulting tokens, and removes stopwords. This analyzer is useful for pure text content and is not ideal if the content contains words with special characters such as product model number. This analyzer comes with a default set of stopwords but you can always have the provision to provide your own set of stopwords.

  • StandardAnalyzer: Splits text using a grammar-based tokenization, normalizes and lowercases tokens, removes stopwords, and discards punctuations. It can be...