One of the most crucial properties of Apache Lucene, and thus Solr, is the Lucene directory implementation. The directory interface provides an abstraction layer for Lucene on all the I/O operations. Although choosing the right directory implementation seems simple, it can affect the performance of your Solr setup in a drastic way. This recipe will show you how to choose the right directory implementation.
In order to use the desired directory, all you need to do is choose the right directory factory implementation and inform Solr about it. Let's assume that you would like to use NRTCachingDirectory
as your directory implementation. In order to do that, you need to place (or replace if it is already present) the following fragment in your solrconfig.xml
file:
<directoryFactory name="DirectoryFactory" class="solr.NRTCachingDirectoryFactory" />
And that's all. The setup is quite simple, but what directory factories are available to use? When this book was written, the following directory factories were available:
solr.StandardDirectoryFactory
solr.SimpleFSDirectoryFactory
solr.NIOFSDirectoryFactory
solr.MMapDirectoryFactory
solr.NRTCachingDirectoryFactory
solr.RAMDirectoryFactory
So now let's see what each of those factories provide.
Before we get into the details of each of the presented directory factories, I would like to comment on the directory factory configuration parameter. All you need to remember is that the name
attribute of the directoryFactory
tag should be set to DirectoryFactory
and the class
attribute should be set to the directory factory implementation of your choice.
If you want Solr to make the decision for you, you should use solr.StandardDirectoryFactory
. This is a filesystem-based directory factory that tries to choose the best implementation based on your current operating system and Java virtual machine used. If you are implementing a small application, which won't use many threads, you can use solr.SimpleFSDirectoryFactory
which stores the index file on your local filesystem, but it doesn't scale well with a high number of threads. solr.NIOFSDirectoryFactory
scales well with many threads, but it doesn't work well on Microsoft Windows platforms (it's much slower), because of the JVM bug, so you should remember that.
solr.MMapDirectoryFactory
was the default directory factory for Solr for the 64-bit Linux systems from Solr 3.1 till 4.0. This directory implementation uses virtual memory and a kernel feature called mmap
to access index files stored on disk. This allows Lucene (and thus Solr) to directly access the I/O cache. This is desirable and you should stick to that directory if near real-time searching is not needed.
If you need near real-time indexing and searching, you should use solr.NRTCachingDirectoryFactory
. It is designed to store some parts of the index in memory (small chunks) and thus speed up some near real-time operations greatly.
The last directory factory, solr.RAMDirectoryFactory
, is the only one that is not persistent. The whole index is stored in the RAM memory and thus you'll lose your index after restart or server crash. Also you should remember that replication won't work when using solr.RAMDirectoryFactory
. One would ask, why should I use that factory? Imagine a volatile index for an autocomplete functionality or for unit tests of your queries' relevancy. Just anything you can think of, when you don't need to have persistent and replicated data. However, please remember that this directory is not designed to hold large amounts of data.