Apache Solr Search Patterns

Book Image

Apache Solr Search Patterns

By : Jayant Kumar

Book Image

Apache Solr Search Patterns

By: Jayant Kumar

Overview of this book

Apache Solr Search Patterns

Apache Solr Search Patterns

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Solr Indexing Internals

Solr Indexing Internals

The job site problem statement – Solr indexing fundamentals

Working of analyzers, tokenizers, and filters

Handling a multilingual search

Measuring the quality of search results

The e-commerce problem statement

The job site problem statement

Challenges of large-scale indexing

The SolrCloud solution

Customizing the Solr Scoring Algorithm

Customizing the Solr Scoring Algorithm

Relevance calculation

Building a custom scorer

Drawbacks of the TF-IDF model

The information gain model

Implementing the information gain model

Options to TF-IDF similarity

Solr Internals and Custom Queries

Solr Internals and Custom Queries

Working of a scorer on an inverted index

Working of OR and AND clauses

The eDisMax query parser

Using BRS queries instead of DisMax

Building a custom query parser

Solr for Big Data

Solr for Big Data

Introduction to big data

Getting data points using facets

Radius faceting for location-based data

Data analysis using pivot faceting

Graphs for analytics

Solr in E-commerce

Solr in E-commerce

Designing an e-commerce search

Handling unclean data

Handling variations in the product

Problems and solutions of flash sale searches

Faceting with the option of multi-select

Faceting with hierarchical taxonomy

Faceting with size

Implementing semantic search

Solr for Spatial Search

Solr for Spatial Search

Features of spatial search

Lucene 4 spatial module

Indexing for spatial search

Searching and filtering on a spatial index

Distance sort and relevancy boost

Advanced concepts

Using Solr in an Advertising System

Using Solr in an Advertising System

Ad system functionalities

Architecture of an ad distribution system

Requirements of an ad distribution system

Performance improvements

Merging Solr with Redis

AJAX Solr

The purpose of AJAX Solr

The AJAX Solr architecture

Working with AJAX Solr

Performance tuning

SolrCloud

The SolrCloud architecture

Centralized configuration

Setting up SolrCloud

Distributed indexing and search

Routing documents to a particular shard

Adding more nodes to the SolrCloud

Fault tolerance and high availability in SolrCloud

Advanced sharding with SolrCloud

Asynchronous calls

Migrating documents to another collection

Sizing and monitoring of SolrCloud

Using SolrCloud as a NoSQL database

Text Tagging with Lucene FST

Text Tagging with Lucene FST

An overview of FST and text tagging

Implementation of FST in Lucene

Text tagging algorithms

Using Solr for text tagging

Implementing a text tagger using Solr

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Text tagging algorithms

The process of text tagging can be explained by the following figure:

A document is tokenized and the tokens are passed to the naive tagger. The naive tagger uses a tagging algorithm to find the tags. Then, the geo-coordinate finder identifies the geo-locations (lat-long coordinates) corresponding to those tags. They are then available as the output.

There are various text tagging algorithms, each of which has its own benefits. Let us go through some of the algorithms that can be used for text tagging.

Fuzzy string matching algorithm

The fuzzy string matching algorithm can be used to match two strings, exactly or partially. This means the relationship is fuzzy when there is a set of n-elements and another set of m-elements, and both partially match the same elements. Using this algorithm, we can identify strings that are similar to a set of other strings. It is like drawing similar terms from the string.

Suppose we want to find the similarity between two words, say jumps...