Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan
Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Table of Contents (16 chapters)
Lucene 4 Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Implementing the divergence from randomness model


In Lucene, divergence from randomness model is implemented as DFRSimilarity. It's made up of three components: BasicModel, AfterEffect, and Normalization. BasicModel is a model of information content, AfterEffect is the first normalization, and Normalization is second (length) normalization. Here is an excerpt from Lucene's Javadoc on DFRSimilarity's components:

  1. BasicModel: This is a basic model of information content:

    • BasicModelBE: This is the limiting form of Bose-Einstein

    • BasicModelG: This is the geometric approximation of Bose-Einstein

    • BasicModelP: This is the Poisson approximation of the Binomial

    • BasicModelD: This is the divergence approximation of the Binomial

    • BasicModelIn: This is the inverse document frequency

    • BasicModelIne: This is the inverse expected document frequency (mixture of Poisson and IDF)

    • BasicModelIF: This is the inverse term frequency (approximation of I(ne))

  2. AfterEffect: This is the first normalization of information...