Book Image

Solr Cookbook - Third Edition

By : Rafal Kuc
Book Image

Solr Cookbook - Third Edition

By: Rafal Kuc

Overview of this book

Table of Contents (18 chapters)
Solr Cookbook Third Edition
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Migrating configuration from master-slave to SolrCloud


After the release of Apache Solr 4.0, many users wanted to leverage SolrCloud-distributed indexing and querying capabilities. SolrCloud is also very useful when it comes to handling collections as you can create them on-the-fly, add replicas, and split already created shards, and this is only an example of the possibilities given by SolrCloud. Now, for releases after Solr 4.0, people are going for SolrCloud even more frequently. It's not hard to upgrade your current master-slave configuration to work on SolrCloud, but there are some things you need to take care of. With the help of the following recipe, you will be able to easily upgrade your cluster.

Getting ready

Before continuing further, it is advised to read the Installing Zookeeper for SolrCloud and Running Solr on a standalone Jetty recipes of this chapter. They will show you how to set up a Zookeeper cluster to be ready for production use and how to configure Jetty and Solr to work with each other.

How to do it...

  1. We will start with altering the schema.xml file. In order to use your old index structure with SolrCloud, you need to add the following fields to the already defined index structure (add the following fragment to the schema.xml file in its fields section):

    <field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>
  2. Now, let's switch to the solrconfig.xml file, starting with the replication handlers. First, you need to ensure that you have a replication handler set up. Remember that you shouldn't add master- or slave-specific configurations to it. So, the replication handler configuration should look like this:

    <requestHandler name="/replication" class="solr.ReplicationHandler" />
  3. In addition to this, you need to have the administration panel handlers present, so the following configuration entry should be present in your solrconfig.xml file:

    <requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
  4. The last request handler that should be present is the real-time get handler, which should be defined as follows (the following should also be added to the solrconfig.xml file):

    <requestHandler name="/get" class="solr.RealTimeGetHandler">
     <lst name="defaults">
      <str name="omitHeader">true</str>
      <str name="wt">json</str>
     </lst>
    </requestHandler>
  5. The next thing SolrCloud needs in order to properly operate is the transaction log configuration. The following fragment should be added to the solrconfig.xml file:

    <updateLog>
     <str name="dir">${solr.data.dir:}</str>
    </updateLog>
  6. The last thing is the solr.xml file. The example solr.xml file should look like this:

    <solr>
     <solrcloud>
      <str name="host">${host:}</str>
      <int name="hostPort">${jetty.port:8983}</int>
      <str name="hostContext">${hostContext:solr}</str>
      <int name="zkClientTimeout">${zkClientTimeout:30000}</int>
      <bool   name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
     </solrcloud>
     <shardHandlerFactory name="shardHandlerFactory"   class="HttpShardHandlerFactory">
      <int name="socketTimeout">${socketTimeout:0}</int>
      <int name="connTimeout">${connTimeout:0}</int>
     </shardHandlerFactory>
    </solr>

That's all. Your Solr instance configuration files are now ready to be used with SolrCloud.

How it works...

Now, let's see why all these changes are needed in order to use our old configuration files with SolrCloud.

The _version_ field is used by Solr to enable document versioning and optimistic locking, which ensures that you won't have the newest version of your document overwritten by mistake. As a result of this, SolrCloud requires the _version_ field to be present in the index structure. Adding this field is simple—you just need to place another field definition that is stored, indexed, and based on a long type, that's all.

As for the replication handler, you should remember not to add slave- or master-specific configurations, but only a simple request handler definition, as shown in the previous example. The same applies to the administration panel handlers; they need to be available under the default URL address.

The real-time get handler is responsible for getting the updated documents right away. In general, the documents are not available to search if the Lucene index searcher is not open, which happens after a hard or soft commit command (we will talk more about commit and soft commit in the Configuring SolrCloud for NRT use cases recipe of this chapter). This handler allows Solr (and also you) to retrieve the latest version of the document without the need to reopen the searcher, and thus, even if the document is not yet visible during a usual search operation. This is done by using the transaction log if the document is not yet indexed. The configuration is very similar to usual request handler configurations; you need to add a new handler with the name property set to /get and the class property set to solr.RealTimeGetHandler. In addition to this, we want the handler to omit response headers (the omitHeader property set to true) and return a response in JSON (with the wt property set to json). We omit the headers so that we have responses that are easier to parse.

One of the last things that is needed by SolrCloud is the transaction log, which enables real-time get operations to be functional. The transaction log keeps track of all the uncommitted changes and enables real-time get handlers to retrieve them. In order to turn on transaction log usage, one should add the updateLog tag to the solrconfig.xml file and specify the directory where the transaction log directory should be created (by adding the dir property, as shown in the example). In the previous configuration, we tell Solr that we want to use the Solr data directory as the place to store transaction log directories.

Finally, Solr needs you to keep the default address for the core administrative interface, so you should remember to have the adminPath property set to the value shown in the example (in the solr.xml file). This is needed in order for Solr to be able to manipulate cores.

We already talked about the solr.xml file contents in the Running Solr on a standalone Jetty recipe in this chapter, so refer to that recipe if you are not familiar with the contents.