Book Image

Administrating Solr

By : Surendra Mohan
Book Image

Administrating Solr

By: Surendra Mohan

Overview of this book

Implementing different search engines on web products is a mandate these days. Apache Solr is a robust search engine, but simply implementing Apache Solr and forgetting about it is not a good idea, especially when you have to fight for the search ranking of your web product. In such a scenario, you need to keep monitoring, administrating, and optimizing your Solr to retain your ranking. "Administrating Solr" is a practical, hands-on guide. This book will provide you with a number of clear, step-by-step exercises and some advanced concepts which will help you administrate, monitor, and optimize Solr using Drupal and associated scripts. Administrating Solr will also provide you with a solid grounding on how you can use Apache Solr with Drupal. "Administrating Solr" starts with an overview of Apache Solr and the installation process to get you familiar with Solr. It then gradually moves on to discuss the mysteries that make Solr flexible enough to render appropriate search results in different scenarios. This book will take you through clear and practical concepts that will help you monitor, administrate, and optimize your Solr appropriately using both scripts and tools. This book will also teach you ways to query your search and methods to keep your Solr healthy and well maintained. With this book, you will learn how to effectively implement and optimize Solr using Drupal.
Table of Contents (12 chapters)

Searching for a phrase


There might be situations wherein you need to search a document title within millions of documents for which string based search is of course not a good idea. So, the question for ourselves; is it possible to achieve using Solr? Fortunately, yes and the next example will guide you through it.

Assume that you have the following type defined, that needs to be added to your schema.xml file.

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> 
<analyzer> 
<tokenizer class="solr.WhitespaceTokenizerFactory"/> 
<filter class="solr.LowerCaseFilterFactory"/> 
<filter class="solr.SnowballPorterFilterFactory" language="English"/> 
</analyzer> 
</fieldType>

And then, add the following fields to your schema.xml.

<field name="id" type="string" indexed="true" stored="true" required="true" /> 
<field name="title" type="text" indexed="true" stored="true" />

Assume that your data looks like this:

<add> 
<doc>
<field name="id">1</field> 
<field name="title">2012 report</field> 
</doc> 
<doc> 
<field name="id">2</field> 
<field name="title">2007 report</field> 
</doc> 
<doc> 
<field name="id">3</field> 
<field name="title">2012 draft report</field> 
</doc> 
</add>

Now, let us instruct Solr to find the documents that have a 2012 report phrase embedded in the title. Execute the following query to Solr:

http://localhost:8080/solr/select?q=title:"2012 report"

If you get the following result, bingo !!! your query worked!

<?xml version="1.0" encoding="UTF-8"?> 
<response> 
<lst name="responseHeader"> 
<int name="status">0</int> 
<int name="QTime">1</int> 
<lst name="params"> 
<str name="q">title:"2012 report"</str> 
</lst> 
</lst> 
<result name="response" numFound="1" start="0"> 
<doc> 
<str name="id">1</str> 
<str name="title">2012 report</str> 
</doc> 
</result> 
</response>

The debug query (the debugQuery=on parameter) shows us what lucene query was made:

<str name="parsedquery">PhraseQuery(title:"2012 report")</str>

As you must have noticed, we got just one document as a result of our query, omitting even the document with the title: 2012 draft report (which is very appropriate and perfect output).

We have used only two fields to demonstrate the concept due to the fact that we are more committed to search a phrase within the title field, here in this demonstration.

Interestingly, here standard Solr query parser has been queried; hence, the field name and the associated value we are looking for can be specified. The query differs from the standard word-search query by using the " character both at the start and end of the query. It dictates Solr to consider the search as a phrase query instead of a term query (which actually makes the difference!). So, this phrase query tells Solr to search considering all the words as a single unit, and not individually.

In addition to this, the phrase query just ensured that the phrase query (that is, the desired one) was made instead of the standard term query.