Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan
Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Table of Contents (16 chapters)
Lucene 4 Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Overriding similarity


The Similarity class is an abstract class that defines a set of components for score calculation. To steer away from default scoring, we can create a new class extending from the DefaultSimilarity (TFIDFSimilarity) or one of the other Similarity classes. We will perform some experimentation in this section to see how each scoring components affect the overall score.

Let's begin by reviewing Similarity's methods:

  • computeNorm(FieldInvertState): This calculates a normalization value for a Field at indexing time.

  • computeWeight(float, CollectionStatics, TermStatistics): This returns a SimWeight object to calculate a score. It accepts a boost (float) value for query-time boosting.

  • coord(int, int): This returns a score factor based on term overlap in a query. This value helps to integrate coordinate-level matching. The default is disabled with the returning value 1.

  • queryNorm(float): This generates a normalization value for a query. The value is also passed back to the Weight...