Book Image

Apache Solr 4 Cookbook

By : Rafał Kuć
Book Image

Apache Solr 4 Cookbook

By: Rafał Kuć

Overview of this book

<p>Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features.<br /><br />"Apache Solr 4 Cookbook" will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.<br /><br />"Apache Solr 4 Cookbook" will make your search better, more accurate and faster with practical recipes on essential topics such as SolrCloud, querying data, search faceting, text and data analysis, and cache configuration.<br /><br />With numerous practical chapters centered on important Solr techniques and methods, Apache Solr 4 Cookbook is an essential resource for developers who wish to take their knowledge and skills further. Thoroughly updated and improved, this Cookbook also covers the changes in Apache Solr 4 including the awesome capabilities of SolrCloud.</p>
Table of Contents (18 chapters)
Apache Solr 4 Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

How to get documents right after they were sent for indexation


Let's say that we would like to get our documents as soon as they were sent for indexing, but without any commit (both hard and soft) operation occurring. Solr 4.0 comes with a special functionality called real-time get , which uses the information of uncommitted documents and can return them as documents. Let's see how we can use it.

How to do it...

This recipe will show how we can get documents right after they were sent for indexation.

  1. Let's begin with defining the following index structure (add it to the field section in your schema.xml file):

    <field name="id" type="string" indexed="true" 
      stored="true" required="true" />
    <field name="name" type="text" indexed="true" 
      stored="true" />
  2. In addition to this, we need the _version_ field to be present, so let's also add the following field to our schema.xml file in its field section:

    <field name="_version_" type="long" indexed="true" 
      stored="true"/>
  3. The third step is to turn on the transaction log functionality in Solr. In order to do this, we should add the following section to the updateHandler configuration section (in the solrconfig.xml file):

    <updateLog>
      <str name="dir">${solr.data.dir:}</str>
    </updateLog>
  4. The last thing we need to do is add a proper request handler configuration to our solrconfig.xml file:

    <requestHandler name="/get" 
      class="solr.RealTimeGetHandler">
      <lst name="defaults">
        <str name="omitHeader">true</str>
        <str name="indent">true</str>
        <str name="wt">xml</str>
      </lst>
    </requestHandler>
  5. Now, we can test how the handler works. In order to do this, let's index the following document (which we've stored in the data.xml file):

    <add>
      <doc>
        <field name="id">1</field>
        <field name="name">Solr 4.0 CookBook</field>
      </doc>
    </add>
  6. In order to index it, we use the following command:

    curl 'http://localhost:8983/solr/update' --data-binary @data.xml -H 'Content-type:application/xml'
    
  7. Now, let's try two things. First, let's search for the document we've just added. In order to do this, we run the following query:

    curl 'http://localhost:8983/solr/select?q=id:1'
    
  8. As you can imagine, we didn't get any documents returned, because we didn't send any commit command – not even the soft commit one. So now, let's use our defined handler:

    curl 'http://localhost:8983/solr/get?id=1'
    

    The following response will be returned by Solr:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <doc name="doc">
        <str name="id">1</str>
        <str name="name">Solr 4.0 CookBook</str>
        <long name="_version_">1418467767663722496</long>
      </doc>
    </response>

    As you can see, our document is returned by our get handler. Let's see how it works now.

How it works...

Our index structure is simple, and there is only one relevant piece of information there – the _version_ field. The real-time get functionality needs that field to be present in our documents, because the transaction log relies on it. However, as you can see in the provided example data, we don't need to worry about this field, because its filled and updated automatically by Solr.

But let's backtrack a bit and discuss the changes made to the solrconfig.xml file. There are two things there. The first one is the update log (the updateLog section), which Solr uses to store the so-called transaction log. Solr stores recent index changes there (until hard commit), in order to provide write durability, consistency, and the ability to provide the real-time get functionality.

The second thing is the handler we defined under the name of /get with the use of the solr.RealTimeGetHandler class. It uses the information in the transaction log to get the documents we want by using their identifier. It can even retrieve the documents that weren't committed and are only stored in the transaction log. So, if we want to get the newest version of the document, we can use it. The other configuration parameters are the same as with the usual request handler, so I'll skip commenting them.

The next thing we do is send the update command without adding the commit command, so that we shouldn't be able to see the document during a standard search. If you look at the results returned by the first query, you'll notice that we didn't get that document. However, when using the /get handler that we previously defined, we get the document we requested. This is because Solr uses the transaction log in order to even the uncommitted document.