Book Image

Apache Solr 4 Cookbook

By : Rafał Kuć
Book Image

Apache Solr 4 Cookbook

By: Rafał Kuć

Overview of this book

<p>Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features.<br /><br />"Apache Solr 4 Cookbook" will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.<br /><br />"Apache Solr 4 Cookbook" will make your search better, more accurate and faster with practical recipes on essential topics such as SolrCloud, querying data, search faceting, text and data analysis, and cache configuration.<br /><br />With numerous practical chapters centered on important Solr techniques and methods, Apache Solr 4 Cookbook is an essential resource for developers who wish to take their knowledge and skills further. Thoroughly updated and improved, this Cookbook also covers the changes in Apache Solr 4 including the awesome capabilities of SolrCloud.</p>
Table of Contents (18 chapters)
Apache Solr 4 Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

How to search your data in a near real-time manner


Sometimes, we need our data to be available as soon as possible. Imagine that we have a SolrCloud cluster up and running, and we want to have our documents available for searching with only a slight delay. For example, our application can be a content management system where it would be very weird if a user adds a new document, and it would take some time for it to be searchable. In order to achieve this, Solr exposes the soft commit functionality, and this recipe will show you how to set it up.

How to do it...

This recipe will show how we can search for data in a near real-time manner.

  1. For the purpose of this recipe, let's assume that we have the following index structure (add it to the field section in your schema.xml file):

    <field name="id" type="string" indexed="true" 
      stored="true" required="true" />
    <field name="name" type="text" indexed="true" 
      stored="true" />
  2. In addition to this, we need to set up the hard and soft automatic commits, for which we will need to add the following section to the updateHandler section in the solrconfig.xml file:

    <autoCommit>
      <maxTime>60000</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>
    
    <autoSoftCommit>
      <maxTime>1000</maxTime>
    </autoSoftCommit>
  3. Let's test if that works. In order to do this, let's index the following document (which we've stored in the data.xml file):

    <add>
      <doc>
        <field name="id">1</field>
        <field name="name">Solr 4.0 CookBook</field>
      </doc>
    </add>
  4. In order to index it, we use the following command:

    curl 'http://localhost:8983/solr/update' --data-binary @data.xml -H 'Content-type:application/xml'
    
  5. We didn't send any commit command , so we shouldn't see any documents, right? I think there will be one available – the one we've just send for indexation. But, let's check that out by running the following simple search command:

    curl 'http://localhost:8983/solr/select?q=id:1'
    

    The following response will be returned by Solr:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
        <lst name="params">
          <str name="q">id:1</str>
        </lst>
      </lst>
      <result name="response" numFound="1" start="0">
        <doc>
          <str name="id">1</str>
          <str name="name">Solr 4.0 CookBook</str>
        </doc>
      </result>
    </response>

    As you can see, our document was returned. So, let's see how it works.

How it works...

As you may know, the standard commit operation is quite resource-intensive – it flushes the changes since the last commit to the disk to the new segment. If you would like to do that every second, we could run into a problem of a very high amount of I/O writes and thus our searches would suffer (of course, this depends on the situation). That's why, with Lucene and Solr 4.0, the new commit type was introduced – the soft commit, which doesn't flush the changes to disk, but just reopens the searcher object and allows us to search the data that is stored in the memory.

As we are usually lazy and don't want to remember when it's time to send the commit and when to use soft commit, we'll let Solr manage that so we properly need to configure the update handler. First, we add the standard auto commit by adding the autoCommit section and saying that we want to commit after every 60 seconds (the maxTime property is specified in milliseconds), and that we don't want to reopen the searcher after the standard commit (the openSearcher property is set to false).

The next thing is to configure the soft auto commit functionality by adding the softAutoCommit section to the update handler configuration. We've specified that we want the soft commit to be fired every second (the maxTime property is specified in milliseconds), and thus our searcher will be reopened every second if there are changes.

As you can see, even though we didn't specify the commit command after our update command, we are still able to find the document we've sent for indexation.