Book Image

Administrating Solr

By : Surendra Mohan
Book Image

Administrating Solr

By: Surendra Mohan

Overview of this book

Implementing different search engines on web products is a mandate these days. Apache Solr is a robust search engine, but simply implementing Apache Solr and forgetting about it is not a good idea, especially when you have to fight for the search ranking of your web product. In such a scenario, you need to keep monitoring, administrating, and optimizing your Solr to retain your ranking. "Administrating Solr" is a practical, hands-on guide. This book will provide you with a number of clear, step-by-step exercises and some advanced concepts which will help you administrate, monitor, and optimize Solr using Drupal and associated scripts. Administrating Solr will also provide you with a solid grounding on how you can use Apache Solr with Drupal. "Administrating Solr" starts with an overview of Apache Solr and the installation process to get you familiar with Solr. It then gradually moves on to discuss the mysteries that make Solr flexible enough to render appropriate search results in different scenarios. This book will take you through clear and practical concepts that will help you monitor, administrate, and optimize your Solr appropriately using both scripts and tools. This book will also teach you ways to query your search and methods to keep your Solr healthy and well maintained. With this book, you will learn how to effectively implement and optimize Solr using Drupal.
Table of Contents (12 chapters)

Distributed search


Distributed search in Solr is a concept of splitting an index into multiple shards, querying, and/or merging results across these shards. Imagine a situation where either the index is too huge to fit on a single system, or you have a query which takes too long to execute. How would you handle such situations? Don't worry! We have distributed search concept in Solr which is especially designed to handle such situations.

Let us consider the above stated scenario where you need to apply distributed search concept in order to overcome the huge index and/or query execution time concerns.

To overcome this situation, you need to distribute a request across ALL shards in a list using the shard parameter. Our request would follow this syntax:

host:port/base_url[,host:port/base_url]

Note

You can add n-number of hosts in a single request. This means that the number of hosts you add, the number of shards you are distributing your request. Additionally, the shard count would depend upon how expensive your query is or how huge your index is.

A sharded request will go to the standard request handler (not necessarily the original); however we can override it using shards.qt. The following are the list of components that support distributed search:

  • Query component

  • Facet component

  • Highlighting component

  • Stats component

  • Spell check component

  • Terms component

  • Term vector component

  • Debug component

  • Grouping component

On the contrary, distributed search has a list of limitations which are:

  • Unique key requirements

  • No distributed IDF

  • Doesn't support QueryElevationComponent

  • Doesn't support Join

  • Index variations between stages

  • Distributed Deadlock

  • Distributed Indexing

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.