Lucene 4 Cookbook

Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan

Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Lucene 4 Cookbook

Lucene 4 Cookbook

Credits

About the Authors

About the Authors

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Introducing Lucene

Introducing Lucene

Installing Lucene

Setting up a simple Java Lucene project

Obtaining an IndexWriter

Creating an analyzer

Creating fields

Creating and writing documents to an index

Deleting documents

Obtaining an IndexSearcher

Creating queries with the Lucene QueryParser

Performing a search

Enumerating results

Analyzing Your Text

Analyzing Your Text

Obtaining a common analyzer

Obtaining a TokenStream

Obtaining TokenAttribute values

Using PositionIncrementAttribute

Using PerFieldAnalyzerWrapper

Defining custom TokenFilters

Defining custom analyzers

Defining custom tokenizers

Defining custom attributes

Indexing Your Data

Indexing Your Data

Obtaining an IndexWriter

Creating a StringField

Creating a TextField

Creating a numeric field

Creating a DocValue Field

Transactional commits and index versioning

Reusing field and document objects per thread

Delving into field norms

Changing similarity implementation used during indexing

Searching Your Indexes

Searching Your Indexes

Obtaining IndexReaders

Un-inverting single-valued fields in memory with FieldCache

Constructing queries

Specifying sort logic

Forming a search result

Using Collectors

Sorting with custom FieldComparator

Near Real-time Searching

Near Real-time Searching

Using the DirectoryReader to open index in Near Real-Time

Using the SearcherManager to refresh IndexSearcher

Generational indexing with TrackingIndexWriter

Maintaining search sessions with SearcherLifetimeManager

Performance tuning: latency and throughput

Querying and Filtering Data

Querying and Filtering Data

Performing advanced filtering

Creating a custom filter

Searching with QueryParser

TermQuery and TermRangeQuery

PrefixQuery and WildcardQuery

PhraseQuery and MultiPhraseQuery

NumericRangeQuery

DisjunctionMaxQuery

CustomScoreQuery

Flexible Scoring

Flexible Scoring

Overriding similarity

Implementing the BM25 model

Implementing the language model

Implementing the divergence from randomness model

Implementing the information-based model

Introducing Elasticsearch

Introducing Elasticsearch

Getting Elasticsearch

Creating a new index

Predefine field mappings

Adding a document

Deleting a document

Updating a document

Performing bulk indexing

Searching the index

Scaling Elasticsearch

Extending Lucene with Modules

Extending Lucene with Modules

Exploring spatial search

Implementing joins

Performing faceting

Implementing grouping

Employing autosuggest

Implementing highlighting

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Implementing the language model

Lucene implemented two language models, LMDirichletSimilarity and LMJelinekMercerSimilarity, based on different distribution smoothing methods. Smoothing is a technique that adds a constant weight so that the zero query term frequency on partially matched documents does not result in a zero score where it's useless in ranking. We will look at these two implementations and see how their weight distributions affect scoring.

How to do it…

We will take a look at LMDirichletSimilarity first and we will reuse our test case from the previous section, but will revert the extended second sentence input:

StandardAnalyzer analyzer = new StandardAnalyzer();
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
LMDirichletSimilarity similarity = new LMDirichletSimilarity(2000);
config.setSimilarity(similarity);
IndexWriter indexWriter = new IndexWriter(directory, config);
Document doc = new Document();
TextField...