Let's say that we would like to get our documents as soon as they were sent for indexing, but without any commit (both hard and soft) operation occurring. Solr 4.0 comes with a special functionality called real-time get , which uses the information of uncommitted documents and can return them as documents. Let's see how we can use it.
This recipe will show how we can get documents right after they were sent for indexation.
Let's begin with defining the following index structure (add it to the field section in your
schema.xml
file):<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="text" indexed="true" stored="true" />
In addition to this, we need the
_version_
field to be present, so let's also add the following field to ourschema.xml
file in itsfield
section:<field name="_version_" type="long" indexed="true" stored="true"/>
The third step is to turn on the transaction log functionality in Solr. In order to do this, we should add the following section to the
updateHandler
configuration section (in thesolrconfig.xml
file):<updateLog> <str name="dir">${solr.data.dir:}</str> </updateLog>
The last thing we need to do is add a proper request handler configuration to our
solrconfig.xml
file:<requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> <str name="indent">true</str> <str name="wt">xml</str> </lst> </requestHandler>
Now, we can test how the handler works. In order to do this, let's index the following document (which we've stored in the
data.xml
file):<add> <doc> <field name="id">1</field> <field name="name">Solr 4.0 CookBook</field> </doc> </add>
In order to index it, we use the following command:
curl 'http://localhost:8983/solr/update' --data-binary @data.xml -H 'Content-type:application/xml'
Now, let's try two things. First, let's search for the document we've just added. In order to do this, we run the following query:
curl 'http://localhost:8983/solr/select?q=id:1'
As you can imagine, we didn't get any documents returned, because we didn't send any
commit
command – not even the soft commit one. So now, let's use our defined handler:curl 'http://localhost:8983/solr/get?id=1'
The following response will be returned by Solr:
<?xml version="1.0" encoding="UTF-8"?> <response> <doc name="doc"> <str name="id">1</str> <str name="name">Solr 4.0 CookBook</str> <long name="_version_">1418467767663722496</long> </doc> </response>
As you can see, our document is returned by our
get
handler. Let's see how it works now.
Our index structure is simple, and there is only one relevant piece of information there – the _version_
field. The real-time get
functionality needs that field to be present in our documents, because the transaction log relies on it. However, as you can see in the provided example data, we don't need to worry about this field, because its filled and updated automatically by Solr.
But let's backtrack a bit and discuss the changes made to the solrconfig.xml
file. There are two things there. The first one is the update log (the updateLog
section), which Solr uses to store the so-called transaction log. Solr stores recent index changes there (until hard commit), in order to provide write durability, consistency, and the ability to provide the real-time get functionality.
The second thing is the handler we defined under the name of /get
with the use of the solr.RealTimeGetHandler
class. It uses the information in the transaction log to get the documents we want by using their identifier. It can even retrieve the documents that weren't committed and are only stored in the transaction log. So, if we want to get the newest version of the document, we can use it. The other configuration parameters are the same as with the usual request handler, so I'll skip commenting them.
The next thing we do is send the update
command without adding the commit
command, so that we shouldn't be able to see the document during a standard search. If you look at the results returned by the first query, you'll notice that we didn't get that document. However, when using the /get
handler that we previously defined, we get the document we requested. This is because Solr uses the transaction log in order to even the uncommitted document.