Solr Cookbook - Third Edition

Book Image

Solr Cookbook - Third Edition

By : Rafal Kuc

Book Image

Solr Cookbook - Third Edition

By: Rafal Kuc

Overview of this book

Solr Cookbook Third Edition

Solr Cookbook Third Edition

Credits

About the Author

About the Author

Acknowledgments

Acknowledgments

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Apache Solr Configuration

Apache Solr Configuration

Running Solr on a standalone Jetty

Installing ZooKeeper for SolrCloud

Migrating configuration from master-slave to SolrCloud

Choosing the proper directory configuration

Configuring the Solr spellchecker

Using Solr in a schemaless mode

Limiting I/O usage

Using core discovery

Configuring SolrCloud for NRT use cases

Configuring SolrCloud for high-indexing use cases

Configuring SolrCloud for high-querying use cases

Configuring the Solr heartbeat mechanism

Changing similarity

Indexing Your Data

Indexing Your Data

Indexing PDF files

Counting the number of fields

Using parsing update processors to parse data

Using scripting update processors to modify documents

Indexing data from a database using Data Import Handler

Incremental imports with DIH

Transforming data when using DIH

Indexing multiple geographical points

Updating document fields

Detecting the document language during indexation

Optimizing the primary key indexation

Handling multiple currencies

Analyzing Your Text Data

Analyzing Your Text Data

Using the enumeration type

Removing HTML tags during indexing

Storing data outside of Solr index

Stemming different languages

Using nonaggressive stemmers

Using the n-gram approach to do performant trailing wildcard searches

Using position increment to divide sentences

Using patterns to replace tokens

Querying Solr

Understanding and using the Lucene query language

Using position aware queries

Using boosting with autocomplete

Phrase queries with shingles

Handling user queries without errors

Handling hierarchies with nested documents

Sorting data on the basis of a function value

Controlling the number of terms needed to match

Affecting document score using function queries

Using simple nested queries

Using the Solr document query join functionality

Handling typos with n-grams

Rescoring query results

Faceting

Getting the number of documents with the same field value

Getting the number of documents with the same value range

Getting the number of documents matching the query and subquery

Removing filters from faceting results

Using decision tree faceting

Calculating faceting for relevant documents in groups

Improving faceting performance for low cardinality fields

Improving Solr Performance

Improving Solr Performance

Handling deep paging efficiently

Configuring the document cache

Configuring the query result cache

Configuring the filter cache

Improving Solr query performance after the start and commit operations

Lowering the memory consumption of faceting and sorting

Speeding up indexing with Solr segment merge tuning

Avoiding caching of rare filters to improve the performance

Controlling the filter execution to improve expensive filter performance

Configuring numerical fields for high-performance sorting and range queries

In the Cloud

Creating a new SolrCloud cluster

Setting up multiple collections on a single cluster

Splitting shards

Having more than a single shard from a collection on a node

Creating a collection on defined nodes

Adding replicas after collection creation

Removing replicas

Moving shards between nodes

Using Additional Functionalities

Using Additional Functionalities

Finding similar documents

Highlighting fragments found in documents

Efficient highlighting

Using versioning

Retrieving information about the index structure

Altering the index structure on a live collection

Grouping documents by the field value

Grouping documents by the query value

Grouping documents by the function value

Efficient documents grouping using the post filter

Dealing with Problems

Dealing with Problems

Dealing with the too many opened files exception

Diagnosing and dealing with memory problems

Configuring sorting for non-English languages

Migrating data to another collection

SolrCloud read-side fault tolerance

Using the check index functionality

Adjusting the Jetty configuration to avoid deadlocks

Tuning segment merging

Avoiding swapping

Real-life Situations

Real-life Situations

Implementing the autocomplete functionality for products

Implementing the autocomplete functionality for categories

Handling time-sliced data using aliases

Boosting words closer to each other

Using the Solr spellchecking functionality

Using the Solr administration panel for monitoring

Automatically expiring Solr documents

Exporting whole query results

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Stemming different languages

Stemming is a very common requirement; it is the process of reducing words to their root form (or stems). Let's imagine the book e-commerce store, where you store the books' names and descriptions. We want to be able to find words such as shown and showed when you type the word show, and vice versa. We can achieve this requirement using stemming algorithms. The thing is, there are no general stemmers; they are language-specific. This recipe will show you how to add stemming to your data analysis chain and where to look for a list of stemmers.

How to do it...

To achieve our requirement to stem English, we need to take certain steps:

We will start with the index structure. Let's assume that our index consists of three fields that we defined in the schema.xml file:

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="string" indexed="true" stored="true" />
<field name="description" type="text_stem" indexed...