Lucene 4 Cookbook

Book Image

Lucene 4 Cookbook

By : Edwood Ng, Vineeth Mohan

Book Image

Lucene 4 Cookbook

By: Edwood Ng, Vineeth Mohan

Overview of this book

Lucene 4 Cookbook

Lucene 4 Cookbook

Credits

About the Authors

About the Authors

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Introducing Lucene

Introducing Lucene

Installing Lucene

Setting up a simple Java Lucene project

Obtaining an IndexWriter

Creating an analyzer

Creating fields

Creating and writing documents to an index

Deleting documents

Obtaining an IndexSearcher

Creating queries with the Lucene QueryParser

Performing a search

Enumerating results

Analyzing Your Text

Analyzing Your Text

Obtaining a common analyzer

Obtaining a TokenStream

Obtaining TokenAttribute values

Using PositionIncrementAttribute

Using PerFieldAnalyzerWrapper

Defining custom TokenFilters

Defining custom analyzers

Defining custom tokenizers

Defining custom attributes

Indexing Your Data

Indexing Your Data

Obtaining an IndexWriter

Creating a StringField

Creating a TextField

Creating a numeric field

Creating a DocValue Field

Transactional commits and index versioning

Reusing field and document objects per thread

Delving into field norms

Changing similarity implementation used during indexing

Searching Your Indexes

Searching Your Indexes

Obtaining IndexReaders

Un-inverting single-valued fields in memory with FieldCache

Constructing queries

Specifying sort logic

Forming a search result

Using Collectors

Sorting with custom FieldComparator

Near Real-time Searching

Near Real-time Searching

Using the DirectoryReader to open index in Near Real-Time

Using the SearcherManager to refresh IndexSearcher

Generational indexing with TrackingIndexWriter

Maintaining search sessions with SearcherLifetimeManager

Performance tuning: latency and throughput

Querying and Filtering Data

Querying and Filtering Data

Performing advanced filtering

Creating a custom filter

Searching with QueryParser

TermQuery and TermRangeQuery

PrefixQuery and WildcardQuery

PhraseQuery and MultiPhraseQuery

NumericRangeQuery

DisjunctionMaxQuery

CustomScoreQuery

Flexible Scoring

Flexible Scoring

Overriding similarity

Implementing the BM25 model

Implementing the language model

Implementing the divergence from randomness model

Implementing the information-based model

Introducing Elasticsearch

Introducing Elasticsearch

Getting Elasticsearch

Creating a new index

Predefine field mappings

Adding a document

Deleting a document

Updating a document

Performing bulk indexing

Searching the index

Scaling Elasticsearch

Extending Lucene with Modules

Extending Lucene with Modules

Exploring spatial search

Implementing joins

Performing faceting

Implementing grouping

Employing autosuggest

Implementing highlighting

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Implementing the information-based model

The information-based model in Lucene consists of three components: Distribution, Lambda, and Normalization. The setup is somewhat similar to DFRSimilarity where you need to instantiate these components in its constructor. The name of the Similarity class for this model is called IBSimilarity. Here is an excerpt from Lucene's Javadoc on the components:

Distribution: This is probabilistic distribution used to model term occurrence:
- DistributionLL: This is the Log-logistic distribution
- DistributionSPL: This is the Smoothed power-law distribution
Lambda: This is the λw parameter of the probability distribution:
- LambdaDF: This is the now/nor average number of documents where w occurs
- LambdaTTF: This is the Fw/Nor average number of occurrences of w in the collection
Normalization: This is term frequency normalization:
- NormalizationH1: In this, there is a uniform distribution of term frequency
- NormalizationH2: In this, term frequency density is inversely...