Apache Solr Search Patterns

Book Image

Apache Solr Search Patterns

By : Jayant Kumar

Book Image

Apache Solr Search Patterns

By: Jayant Kumar

Overview of this book

Apache Solr Search Patterns

Apache Solr Search Patterns

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Solr Indexing Internals

Solr Indexing Internals

The job site problem statement – Solr indexing fundamentals

Working of analyzers, tokenizers, and filters

Handling a multilingual search

Measuring the quality of search results

The e-commerce problem statement

The job site problem statement

Challenges of large-scale indexing

The SolrCloud solution

Customizing the Solr Scoring Algorithm

Customizing the Solr Scoring Algorithm

Relevance calculation

Building a custom scorer

Drawbacks of the TF-IDF model

The information gain model

Implementing the information gain model

Options to TF-IDF similarity

Solr Internals and Custom Queries

Solr Internals and Custom Queries

Working of a scorer on an inverted index

Working of OR and AND clauses

The eDisMax query parser

Using BRS queries instead of DisMax

Building a custom query parser

Solr for Big Data

Solr for Big Data

Introduction to big data

Getting data points using facets

Radius faceting for location-based data

Data analysis using pivot faceting

Graphs for analytics

Solr in E-commerce

Solr in E-commerce

Designing an e-commerce search

Handling unclean data

Handling variations in the product

Problems and solutions of flash sale searches

Faceting with the option of multi-select

Faceting with hierarchical taxonomy

Faceting with size

Implementing semantic search

Solr for Spatial Search

Solr for Spatial Search

Features of spatial search

Lucene 4 spatial module

Indexing for spatial search

Searching and filtering on a spatial index

Distance sort and relevancy boost

Advanced concepts

Using Solr in an Advertising System

Using Solr in an Advertising System

Ad system functionalities

Architecture of an ad distribution system

Requirements of an ad distribution system

Performance improvements

Merging Solr with Redis

AJAX Solr

The purpose of AJAX Solr

The AJAX Solr architecture

Working with AJAX Solr

Performance tuning

SolrCloud

The SolrCloud architecture

Centralized configuration

Setting up SolrCloud

Distributed indexing and search

Routing documents to a particular shard

Adding more nodes to the SolrCloud

Fault tolerance and high availability in SolrCloud

Advanced sharding with SolrCloud

Asynchronous calls

Migrating documents to another collection

Sizing and monitoring of SolrCloud

Using SolrCloud as a NoSQL database

Text Tagging with Lucene FST

Text Tagging with Lucene FST

An overview of FST and text tagging

Implementation of FST in Lucene

Text tagging algorithms

Using Solr for text tagging

Implementing a text tagger using Solr

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Options to TF-IDF similarity

In addition to the default TF-IDF similarity implementation, other similarity implementations are available by default with Lucene and Solr. These models also work around the frequency of the searched term and the documents containing the searched term. However, the concept and the algorithm used to calculate the score differ.

Let us go through some of the most used ranking algorithms.

BM25 similarity

The Best Matching (BM25) algorithm is a probabilistic Information Retrieval (IR) model, while TF-IDF is a vector space model for information retrieval. The probabilistic IR model operates such that, given some relevant and non-relevant documents, we can calculate the probability of a term appearing in a relevant document, and this could be the basis of a classifier that decides whether the documents are relevant or not.

On a practical front, the BM25 model also defines the weight of each term as a product of some term frequency function and some inverse document frequency...