Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan
Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Table of Contents (16 chapters)
Lucene 4 Cookbook
About the Authors
About the Reviewers

Implementing the BM25 model

Let's take a look at how we use the BM25 model in Lucene. Lucene implements this model as BM25Similarity. We can start using this model as simply as instantiating it with default parameters. The constructor accepts two parameters for tuning. The first parameter controls nonlinear term frequency normalization. Its default value is 1.2. The second parameter controls to what degree a document length normalizes the tf values.

How to do It…

Here we have our sample code to demonstrate how to use BM25Similarity;

StandardAnalyzer analyzer = new StandardAnalyzer();
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
BM25Similarity similarity = new BM25Similarity(1.2f, 0.75f);
IndexWriter indexWriter = new IndexWriter(directory, config);
Document doc = new Document();
TextField textField = new TextField("content", "", Field.Store.YES);
String[] contents = {"Humpty Dumpty sat...