Book Image

Administrating Solr

By : Surendra Mohan
Book Image

Administrating Solr

By: Surendra Mohan

Overview of this book

Implementing different search engines on web products is a mandate these days. Apache Solr is a robust search engine, but simply implementing Apache Solr and forgetting about it is not a good idea, especially when you have to fight for the search ranking of your web product. In such a scenario, you need to keep monitoring, administrating, and optimizing your Solr to retain your ranking. "Administrating Solr" is a practical, hands-on guide. This book will provide you with a number of clear, step-by-step exercises and some advanced concepts which will help you administrate, monitor, and optimize Solr using Drupal and associated scripts. Administrating Solr will also provide you with a solid grounding on how you can use Apache Solr with Drupal. "Administrating Solr" starts with an overview of Apache Solr and the installation process to get you familiar with Solr. It then gradually moves on to discuss the mysteries that make Solr flexible enough to render appropriate search results in different scenarios. This book will take you through clear and practical concepts that will help you monitor, administrate, and optimize your Solr appropriately using both scripts and tools. This book will also teach you ways to query your search and methods to keep your Solr healthy and well maintained. With this book, you will learn how to effectively implement and optimize Solr using Drupal.
Table of Contents (12 chapters)

Prioritizing your document in search results


You might come across situations wherein you need to promote some of your products and would like to find those on top of other documents in the search result list. Additionally, you might also need to have such products flexible and define exclusive queries applicable only to these products and not to the others. To achieve so, you might think of options such as boosting, index time boosting, or probably some special field. Don't worry! Solr will help you out via this section using a robust component known as QueryElevationComponent.

As QueryElevationComponent is biased to specific documents, it impacts the overall search process for other documents. Thus, it is recommended to use this feature only when it is required.

First of all, let us add the component definition in the solrconfig.xml file, which should look like this:

<searchComponent name="elevator" class="solr.QueryElevationComponent" > 
<str name="queryFieldType">string</str> 
<str name="config-file">elevate.xml</str> 
</searchComponent>

Now we will add the appropriate request handler that will include the elevation component. We will name it /promote it, due to the fact that this feature is mainly used to promote your document in search results. Add this to your solrconfig.xml file:

<requestHandler name="/promotion" class="solr.SearchHandler"> 
<arr name="last-components"> 
<str>elevator</str> 
</arr> 
</requestHandler>

You must have noticed a mysterious file, elevate.xml that has been included in the query elevation component, which actually contains the following data and are placed in the configuration directory of the Solr instance.

<?xml version="1.0" encoding="UTF-8" ?> 
<elevate> 
<query text="solr"> 
<doc id="3" /> 
<doc id="1" /> 
</query> 
</elevate>

Here we want our documents with identifiers 3 and 1 to be on the first and second position respectively in the search result list.

Now it is time to add the below field definition to the schema.xml file.

<field name="id" type="string" indexed="true" stored="true" required="true" /> 
<field name="name" type="text" indexed="true" stored="true" />

The following are the data which have been indexed:

<add> 
<doc> 
  <field name="id">1</field> 
  <field name="name">Solr Optimization</field> 
</doc> 
<doc>
  <field name="id">2</field> 
  <field name="name">Solr Monitoring</field> 
</doc> 
<doc> 
   <field name="id">3</field> 
   <field name="name">Solr annual report</field> 
</doc> 
</add>

Now, it's time to run the following query:

http://localhost:8080/solr/promotion?q=solr

If you get the following result, you can be assured that your query worked out successfully:

<?xml version="1.0" encoding="UTF-8"?> 
<response> 
<lst name="responseHeader"> 
<int name="status">0</int> 
<int name="QTime">1</int> 
<lst name="params"> 
<str name="q">solr</str> 
</lst> 
</lst> 
<result name="response" numFound="3" start="0"> 
<doc> 
<str name="id">3</str> 
<str name="name">Solr annual report</str> 
</doc> 
<doc> 
<str name="id">1</str> 
<str name="name">Solr Optimization</str> 
</doc> 
<doc> 
<str name="id">2</str> 
<str name="name">Solr Monitoring</str> 
</doc> 
</result> 
</response>

In the first part of the configuration, we have defined a new search component (elevator component in our case) and a class attribute (the QueryElevationComponent class in our case). Along with these, we have two additional attributes that define the elevation component behavior which are as follows:

  • queryFieldType: This attribute tells Solr which type of field should be used to parse the query text that is given to the component (for example, if you want the component to ignore letter case, you should set this parameter to the field type that makes its contents lowercase)

  • config-file: This is the configuration file which will be used by the component. It denotes the path of the file that defines query elevation. This file will reside either at ${instanceDir}/conf/${config-file} or ${dataDir}/${config-file}. If the file exists in /conf/ directory, it will be loaded during startup. On the contrary, if the file exists in data directory, it would reload for each IndexReader.

Now, let us step into the next part of solrconfig.xml, which is search handler definition. It tells Solr to create a new search handler with the name /promotion (the name attribute) and using the solr.SearchHandler class (the class attribute). This handler definition also tells Solr to include a component named elevator, which means that the search handler is going to use our defined component. As you might know, you can use more than one search component in a single search handler.

In the actual configuration of the elevate component, you can see that there is a query defined (the query XML tag) with an attribute text="solr", which defines the behavior of the component when a user passes solr to the q parameter. You can see a list of unique identifiers of documents that will be placed on top of the results list for the defined query under this tag, where each document is defined by a doc tag and an id attribute (which have to be defined on the basis of solr.StrField) which holds the unique identifier.

The query is made to our new handler with just a simple one word q parameter (the default search field is set to name in the schema.xml file). Recall the elevate.xml file and the documents we defined for the query we just passed to Solr. Yes of course, we told Solr that we want documents with id=3 and id=1 to be placed on first and second positions respectively in the search result list. And ultimately, our query worked and you can see the documents were placed exactly as we wanted.