Apache Solr Search Patterns

Book Image

Apache Solr Search Patterns

By : Jayant Kumar

Book Image

Apache Solr Search Patterns

By: Jayant Kumar

Overview of this book

Apache Solr Search Patterns

Apache Solr Search Patterns

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Solr Indexing Internals

Solr Indexing Internals

The job site problem statement – Solr indexing fundamentals

Working of analyzers, tokenizers, and filters

Handling a multilingual search

Measuring the quality of search results

The e-commerce problem statement

The job site problem statement

Challenges of large-scale indexing

The SolrCloud solution

Customizing the Solr Scoring Algorithm

Customizing the Solr Scoring Algorithm

Relevance calculation

Building a custom scorer

Drawbacks of the TF-IDF model

The information gain model

Implementing the information gain model

Options to TF-IDF similarity

Solr Internals and Custom Queries

Solr Internals and Custom Queries

Working of a scorer on an inverted index

Working of OR and AND clauses

The eDisMax query parser

Using BRS queries instead of DisMax

Building a custom query parser

Solr for Big Data

Solr for Big Data

Introduction to big data

Getting data points using facets

Radius faceting for location-based data

Data analysis using pivot faceting

Graphs for analytics

Solr in E-commerce

Solr in E-commerce

Designing an e-commerce search

Handling unclean data

Handling variations in the product

Problems and solutions of flash sale searches

Faceting with the option of multi-select

Faceting with hierarchical taxonomy

Faceting with size

Implementing semantic search

Solr for Spatial Search

Solr for Spatial Search

Features of spatial search

Lucene 4 spatial module

Indexing for spatial search

Searching and filtering on a spatial index

Distance sort and relevancy boost

Advanced concepts

Using Solr in an Advertising System

Using Solr in an Advertising System

Ad system functionalities

Architecture of an ad distribution system

Requirements of an ad distribution system

Performance improvements

Merging Solr with Redis

AJAX Solr

The purpose of AJAX Solr

The AJAX Solr architecture

Working with AJAX Solr

Performance tuning

SolrCloud

The SolrCloud architecture

Centralized configuration

Setting up SolrCloud

Distributed indexing and search

Routing documents to a particular shard

Adding more nodes to the SolrCloud

Fault tolerance and high availability in SolrCloud

Advanced sharding with SolrCloud

Asynchronous calls

Migrating documents to another collection

Sizing and monitoring of SolrCloud

Using SolrCloud as a NoSQL database

Text Tagging with Lucene FST

Text Tagging with Lucene FST

An overview of FST and text tagging

Implementation of FST in Lucene

Text tagging algorithms

Using Solr for text tagging

Implementing a text tagger using Solr

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Implementing the information gain model

The problem with the information gain model is that, for each term in the index, we will have to evaluate the occurrence of every other term. The complexity of the algorithm will be of the order of square of the two terms, square(xy). It is not possible to compute this using a simple machine. What is recommended is that we create a map-reduce job and use a distributed Hadoop cluster to compute the information gain for each term in the index.

Our distributed Hadoop cluster would do the following:

Count all occurrences of each term in the index
Count all occurrences of each co-occurring term in the index
Construct a hash table or a map of co-occurring terms
Calculate the information gain for each term and store it in a file in the Hadoop cluster

In order to implement this in our scoring algorithm, we will need to build a custom scorer where the IDF calculation is overwritten by the algorithm for deriving the information gain for the term from the Hadoop cluster...