Book Image

Apache Solr 4 Cookbook

By : Rafał Kuć
Book Image

Apache Solr 4 Cookbook

By: Rafał Kuć

Overview of this book

<p>Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features.<br /><br />"Apache Solr 4 Cookbook" will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.<br /><br />"Apache Solr 4 Cookbook" will make your search better, more accurate and faster with practical recipes on essential topics such as SolrCloud, querying data, search faceting, text and data analysis, and cache configuration.<br /><br />With numerous practical chapters centered on important Solr techniques and methods, Apache Solr 4 Cookbook is an essential resource for developers who wish to take their knowledge and skills further. Thoroughly updated and improved, this Cookbook also covers the changes in Apache Solr 4 including the awesome capabilities of SolrCloud.</p>
Table of Contents (18 chapters)
Apache Solr 4 Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

Configuring spellchecker to not use its own index


If you are used to the way spellchecker worked in the previous Solr versions, you may remember that it required its own index to give you spelling corrections. That approach had some disadvantages, such as the need for rebuilding the index, and replication between master and slave servers. With the Solr Version 4.0, a new spellchecker implementation was introduced – solr.DirectSolrSpellchecker. It allowed you to use your main index to provide spelling suggestions and didn't need to be rebuilt after every commit. So now, let's see how to use that new spellchecker implementation in Solr.

How to do it...

First of all, let's assume we have a field in the index called title, in which we hold titles of our documents. What's more, we don't want the spellchecker to have its own index and we would like to use that title field to provide spelling suggestions. In addition to that, we would like to decide when we want a spelling suggestion. In order to do that, we need to do two things:

  1. First, we need to edit our solrconfig.xml file and add the spellchecking component, whose definition may look like the following code:

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <str name="queryAnalyzerFieldType">title</str>
      <lst name="spellchecker">
        <str name="name">direct</str>
        <str name="field">title</str>
        <str name="classname">solr.DirectSolrSpellChecker</str>
        <str name="distanceMeasure">internal</str>
        <float name="accuracy">0.8</float>
        <int name="maxEdits">1</int>
        <int name="minPrefix">1</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">3</int>
        <float name="maxQueryFrequency">0.01</float>
      </lst>
    </searchComponent>
  2. Now we need to add a proper request handler configuration that will use the previously mentioned search component. To do that, we need to add the following section to the solrconfig.xml file:

    <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
      <lst name="defaults">
        <str name="df">title</str>
        <str name="spellcheck.dictionary">direct</str>
        <str name="spellcheck">on</str>
        <str name="spellcheck.extendedResults">true</str>       
        <str name="spellcheck.count">5</str>     
        <str name="spellcheck.collate">true</str>
        <str name="spellcheck.collateExtendedResults">true</str>      
      </lst>
      <arr name="last-components">
        <str>spellcheck</str>
      </arr>
    </requestHandler>
  3. And that's all. In order to get spelling suggestions, we need to run the following query:

    /spell?q=disa
  4. In response we will get something like the following code:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
    <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">5</int>
    </lst>
    <result name="response" numFound="0" start="0">
    </result>
    <lst name="spellcheck">
      <lst name="suggestions">
        <lst name="disa">
          <int name="numFound">1</int>
          <int name="startOffset">0</int>
          <int name="endOffset">4</int>
          <int name="origFreq">0</int>
          <arr name="suggestion">
            <lst>
              <str name="word">data</str>
              <int name="freq">1</int>
            </lst>
          </arr>
        </lst>
        <bool name="correctlySpelled">false</bool>
        <lst name="collation">
          <str name="collationQuery">data</str>
          <int name="hits">1</int>
          <lst name="misspellingsAndCorrections">
            <str name="disa">data</str>
          </lst>
        </lst>
      </lst>
    </lst>
    </response>

If you check your data folder you will see that there is not a single directory responsible for holding the spellchecker index. So, now let's see how that works.

How it works...

Now let's get into some specifics about how the previous configuration works, starting from the search component configuration. The queryAnalyzerFieldType property tells Solr which field configuration should be used to analyze the query passed to the spellchecker. The name property sets the name of the spellchecker which will be used in the handler configuration later. The field property specifies which field should be used as the source for the data used to build spelling suggestions. As you probably figured out, the classname property specifies the implementation class, which in our case is solr.DirectSolrSpellChecker, enabling us to omit having a separate spellchecker index. The next parameters visible in the configuration specify how the Solr spellchecker should behave and that is beyond the scope of this recipe (however, if you would like to read more about them, please go to the following URL address: http://wiki.apache.org/solr/SpellCheckComponent).

The last thing is the request handler configuration. Let's concentrate on all the properties that start with the spellcheck prefix. First we have spellcheck.dictionary, which in our case specifies the name of the spellchecking component we want to use (please note that the value of the property matches the value of the name property in the search component configuration). We tell Solr that we want the spellchecking results to be present (the spellcheck property with the value set to on), and we also tell Solr that we want to see the extended results format (spellcheck.extendedResults set to true). In addition to the mentioned configuration properties, we also said that we want to have a maximum of five suggestions (the spellcheck.count property), and we want to see the collation and its extended results (spellcheck.collate and spellcheck.collateExtendedResults both set to true).

There's more...

Let's see one more thing – the ability to have more than one spellchecker defined in a request handler.

More than one spellchecker

If you would like to have more than one spellchecker handling your spelling suggestions you can configure your handler to use multiple search components. For example, if you would like to use search components (spellchecking ones) named word and better (you have to have them configured), you could add multiple spellcheck.dictionary parameters to your request handler. This is how your request handler configuration would look:

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="df">title</str>
    <str name="spellcheck.dictionary">direct</str>
    <str name="spellcheck.dictionary">word</str>
    <str name="spellcheck.dictionary">better</str>
    <str name="spellcheck">on</str>
    <str name="spellcheck.extendedResults">true</str>       
    <str name="spellcheck.count">5</str>     
    <str name="spellcheck.collate">true</str>
    <str name="spellcheck.collateExtendedResults">true</str>      
  </lst>
  <arr name="last-components">
    <str>spellcheck</str>
  </arr>
</requestHandler>