Book Image

Apache Solr Search Patterns

By : Jayant Kumar
Book Image

Apache Solr Search Patterns

By: Jayant Kumar

Overview of this book

Table of Contents (17 chapters)
Apache Solr Search Patterns
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

The information gain model


The information gain model is a type of machine learning concept that can be used in place of the inverse document frequency approach. The concept being used here is the probability of observing two terms together on the basis of their occurrence in an index. We use an index to evaluate the occurrence of two terms x and y and calculate the information gain for each term in the index:

  • P(x): Probability of a term x appearing in a listing

  • P(x|y): Probability of the term x appearing given a term y also appears

The information gain value of the term y can be computed as follows:

Information gain equation

This equation says that the more number of times term y appears with term x with respect to the total occurrence of term x, the higher is the information gain for that y.

Let us take a few examples to understand the concept.

In the earlier example, if the term unique appears with jacket a large number of times as compared to the total occurrence of the term jacket, then...