Book Image

Administrating Solr

By : Surendra Mohan
Book Image

Administrating Solr

By: Surendra Mohan

Overview of this book

Implementing different search engines on web products is a mandate these days. Apache Solr is a robust search engine, but simply implementing Apache Solr and forgetting about it is not a good idea, especially when you have to fight for the search ranking of your web product. In such a scenario, you need to keep monitoring, administrating, and optimizing your Solr to retain your ranking. "Administrating Solr" is a practical, hands-on guide. This book will provide you with a number of clear, step-by-step exercises and some advanced concepts which will help you administrate, monitor, and optimize Solr using Drupal and associated scripts. Administrating Solr will also provide you with a solid grounding on how you can use Apache Solr with Drupal. "Administrating Solr" starts with an overview of Apache Solr and the installation process to get you familiar with Solr. It then gradually moves on to discuss the mysteries that make Solr flexible enough to render appropriate search results in different scenarios. This book will take you through clear and practical concepts that will help you monitor, administrate, and optimize your Solr appropriately using both scripts and tools. This book will also teach you ways to query your search and methods to keep your Solr healthy and well maintained. With this book, you will learn how to effectively implement and optimize Solr using Drupal.
Table of Contents (12 chapters)

Boosting phrases over words


Since you are in a competitive market, assume that one day your online product met a disaster wherein your product's search result suddenly falls down. To overcome this scenario and survive in such a competitive market, probably you would like to favor documents that have the exact phrase typed by the end-user over the documents that have matches in separate words. We will guide you on how to achieve this in this section.

I assume that we will use dismax query parser, instead of the standard one. Moreover, we will re-use the same schema.xml that was demonstrated in the Searching for a phrase section in this chapter.

Our sample data looks like this:

<add> 
<doc> 
<field name="id">1</field> 
<field name="title">Annual 2012 report final draft</field> 
</doc> 
<doc> 
<field name="id">2</field> 
<field name="title">2007 report</field> 
</doc> 
<doc> 
<field name="id">3</field> 
<field name="title">2012 draft report</field> 
</doc> 
</add>

As mentioned earlier, we would like to boost or give preference to those documents that have phrase matches over others matching the query. To achieve this, run the following query to your Solr instance:

http://localhost:8080/solr/select?defType=dismax&pf=title^100&q=2012 +report&qf=title

And the desired result should look like:

<?xml version="1.0" encoding="UTF-8"?> 
<response> 
<lst name="responseHeader"> 
<int name="status">0</int> 
<int name="QTime">1</int> 
<lst name="params"> 
<str name="qf">title</str> 
<str name="pf">title^100</str> 
<str name="q">2012 report</str> 
<str name="defType">dismax</str>
</lst> 
</lst> 
<result name="response" numFound="2" start="0"> 
<doc> 
<str name="id">1</str> 
<str name="title">Annual 2012 report last draft</str> 
</doc> 
<doc> 
<str name="id">3</str> 
<str name="title">2012 draft report</str> 
</doc> 
</result> 
</response>

We have a couple of parameters which have been added to this example and might be new to you. Don't worry! I will explain all of them. The first parameter is defType, which tells Solr which query parser we will be using (dismax in our case). If you are not familiar or would like to learn more about dismax, http://wiki.apache.org/solr/DisMax is where you should go! One of the features of this query parser is the ability to tell Solr which field should be used to search for phrases, and this is achieved using the pf parameter. The pf parameter takes a list of fields with the boost that corresponds to them, for instance, pf=title^100 which means that the phrase found in the title field will be boosted with a value of 100. The q parameter is the standard query parameter which you might be familiar with. In our example, we passed the words we are searching for using AND operator. Through our example we are looking for the documents which satisfy '2012' AND 'report' equation, also known as occurrences of both '2012' and 'report' words found in the title.

Tip

You must remember that you can't pass a query such as fieldname: value to the q parameter and use dismax query parser. The fields you are searching against should be specified using the qf parameter.