Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan
Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Table of Contents (16 chapters)
Lucene 4 Cookbook
About the Authors
About the Reviewers

Implementing the divergence from randomness model

In Lucene, divergence from randomness model is implemented as DFRSimilarity. It's made up of three components: BasicModel, AfterEffect, and Normalization. BasicModel is a model of information content, AfterEffect is the first normalization, and Normalization is second (length) normalization. Here is an excerpt from Lucene's Javadoc on DFRSimilarity's components:

  1. BasicModel: This is a basic model of information content:

    • BasicModelBE: This is the limiting form of Bose-Einstein

    • BasicModelG: This is the geometric approximation of Bose-Einstein

    • BasicModelP: This is the Poisson approximation of the Binomial

    • BasicModelD: This is the divergence approximation of the Binomial

    • BasicModelIn: This is the inverse document frequency

    • BasicModelIne: This is the inverse expected document frequency (mixture of Poisson and IDF)

    • BasicModelIF: This is the inverse term frequency (approximation of I(ne))

  2. AfterEffect: This is the first normalization of information...