After the release of Apache Solr 4.0, many users will want to leverage SolrCloud distributed indexing and querying capabilities. It's not hard to upgrade your current cluster to SolrCloud, but there are some things you need to take care of. With the help of the following recipe you will be able to easily upgrade your cluster.
Before continuing further it is advised to read the Installing a standalone ZooKeeper recipe in this chapter. It shows how to set up a ZooKeeper cluster in order to be ready for production use.
In order to use your old index structure with SolrCloud, you will need to add the following field to your fields definition (add the following fragment to the schema.xml
file, to its fields
section):
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>
Now let's switch to the solrconfig.xml
file – starting with the replication handlers. First, you need to ensure that you have the replication handler set up. Remember that you shouldn't add master or slave specific configurations to it. So the replication handlers' configuration should look like the following code:
<requestHandler name="/replication" class="solr.ReplicationHandler" />
In addition to that, you will need to have the administration panel handlers present, so the following configuration entry should be present in your solrconfig.xml
file:
<requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
The last request handler that should be present is the real-time get
handler, which should be defined as follows (the following should also be added to the solrconfig.xml
file):
<requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> </lst> </requestHandler>
The next thing SolrCloud needs in order to properly operate is the transaction log configuration. The following fragment should be added to the solrconfig.xml
file:
<updateLog> <str name="dir">${solr.data.dir:}</str> </updateLog>
The last thing is the solr.xml
file. It should be pointing to the default cores administration address – the cores
tag should have the adminPath
property set to the /admin/cores
value. The example solr.xml
file could look like the following code:
<solr persistent="true"> <cores adminPath="/admin/cores" defaultCoreName="collection1" host="localhost" hostPort="8983" zkClientTimeout="15000"> <core name="collection1" instanceDir="collection1" /> </cores> </solr>
And that's all, your Solr instances configuration files are now ready to be used with SolrCloud.
So now let's see why all those changes are needed in order to use our old configuration files with SolrCloud.
The _version_
field is used by Solr to enable documents versioning and optimistic locking, which ensures that you won't have the newest version of your document overwritten by mistake. Because of that, SolrCloud requires the _version_
field to be present in your index structure. Adding that field is simple – you just need to place another field definition that is stored and indexed, and based on the long
type. That's all.
As for the replication handler, you should remember not to add slave or master specific configuration, only the simple request handler definition, as shown in the previous example. The same applies to the administration panel handlers: they need to be available under the default URL address.
The real-time get
handler is responsible for getting the updated documents right away, even if no commit or the softCommit
command is executed. This handler allows Solr (and also you) to retrieve the latest version of the document without the need for re-opening the searcher, and thus even if the document is not yet visible during usual search operations. The configuration is very similar to the usual request handler configuration – you need to add a new handler with the name
property set to /get
and the class
property set to solr.RealTimeGetHandler
. In addition to that, we want the handler to be omitting response headers (the omitHeader
property set to true
).
One of the last things that is needed by SolrCloud is the transaction log, which enables real-time get
operations to be functional. The transaction log keeps track of all the uncommitted changes and enables a real-time get
handler to retrieve those. In order to turn on transaction log usage, one should add the updateLog
tag to the solrconfig.xml
file and specify the directory where the transaction log directory should be created (by adding the dir
property as shown in the example). In the configuration previously shown, we tell Solr that we want to use the Solr data directory as the place to store the transaction log directory.
Finally, Solr needs you to keep the default address for the core administrative interface, so you should remember to have the adminPath
property set to the value shown in the example (in the solr.xml
file). This is needed in order for Solr to be able to manipulate cores.