Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan
Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Table of Contents (16 chapters)
Lucene 4 Cookbook
About the Authors
About the Reviewers


TermVectors is a feature in Lucene that lets you retrieve per document term-based statistical data from the index. These additional data points can be useful for features such as highlighting or any term-based reports analysis. As you may expect, this feature is not enabled by default, as it can be expensive to compute these data points and it would increase the index size significantly.

This TermVectors provides the following additional data points for each document:

  • Term frequency

  • Term position(s)

  • Term offsets

Term frequency is the number of times the term appears in a document. Positions is the term in a document where each position is incremented by term. offsets has a starting and ending positions by characters where the term can be located in a document.

Let's look at an example of what you can expect to see in TermVectors. Here is a piece of text to be added to a document:

humpty dumpty sat on a wall

Here is what you will retrieve from TermVectors: