Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan
Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Table of Contents (16 chapters)
Lucene 4 Cookbook
About the Authors
About the Reviewers

Implementing the information-based model

The information-based model in Lucene consists of three components: Distribution, Lambda, and Normalization. The setup is somewhat similar to DFRSimilarity where you need to instantiate these components in its constructor. The name of the Similarity class for this model is called IBSimilarity. Here is an excerpt from Lucene's Javadoc on the components:

  1. Distribution: This is probabilistic distribution used to model term occurrence:

    • DistributionLL: This is the Log-logistic distribution

    • DistributionSPL: This is the Smoothed power-law distribution

  2. Lambda: This is the λw parameter of the probability distribution:

    • LambdaDF: This is the now/nor average number of documents where w occurs

    • LambdaTTF: This is the Fw/Nor average number of occurrences of w in the collection

  3. Normalization: This is term frequency normalization:

    • NormalizationH1: In this, there is a uniform distribution of term frequency

    • NormalizationH2: In this, term frequency density is inversely...