Solr is a Java based web application, but you don't need to be particularly familiar with Java in order to use it. With most topics, this book assumes little to no such knowledge on your part. However, if you wish to extend Solr, then you will definitely need to know Java. I also assume a basic familiarity with the command line, whether it is DOS or any Unix shell.
Before truly getting started with Solr, let's get the prerequisites out of the way. Note that if you are using Mac OS X, then you should have the needed pieces already (though you may need the developer tools add-on). If any of the -version
test commands mentioned as follows fail, then you don't have it. URLs are provided for convenience, but it is up to you to install the software according to instructions provided at the relevant sites.
A Java Development Kit (JDK) v1.5 or later: You can download the JDK from http://java.sun.com/javase/. Typing java
-version
will tell you which version of Java you are using if any, and you should type javac -version
to ensure that you have the development kit too. You only need the JRE to run Solr, but you will need the JDK to compile it from source and to extend it.
Apache Ant: Any recent version should do and is available at http://ant.apache.org/. If you never modify Solr and just stick to a recent official release, then you can skip this. Note that the software provided with this book uses Ant as well. Therefore, you'll want Ant if you wish to follow along. Typing ant
-version
should demonstrate that you have it installed.
Subversion or Git for source control of Solr: http://subversion.tigris.org/getting.html or http://git-scm.com/. This isn't strictly necessary, but it's recommended for working with Solr's source code. If you choose to use a command line based distribution of either, then svn -version
or git --version
should work. Further instructions in this book are based on the command line, because it is a universal access method.
Any Java EE servlet engine app-server: This is a Java web server. Solr includes one already, Jetty, and we'll be using this throughout the book. In a later chapter, "Solr in the real world", deploying to an alternative is discussed.
Let's finally get started and get Solr running. The official site for Solr is at http://lucene.apache.org/solr, where you can download the latest official release. Solr 1.3 was released on September 15th, 2008. Solr 1.4 is expected around the same time a year later and thus is probably available as you read this. This book was written in-between these releases and so it contains many but not all of 1.4's features. An alternative to downloading an official release is getting the latest code from source control (that is version control). In either case, the directory structure is conveniently identical and both include the source code. For many open source projects, the choice is almost always the last official release and not the latest source.
However, Solr's committers have made unit and integration testing a priority, evident by the testing infrastructure and test code-coverage of over 70 percent (http://hudson.zones.apache.org/hudson/view/Solr/job/Solr-trunk/clover/), which is very good. Many projects have none at all. As a result, the latest source release is very stable, and it also makes changes to Solr easier, given that so many tests are in place to give confidence that Solr is working properly—so far as the tests test it, of course. And unlike a database, which is almost never modified to suit the needs of a project, Solr is modified often. Also note that there are a good many feature additions provided as source code patches within Solr's JIRA (its issue tracking system). The decision is of course up to you. If you are satisfied with the feature-set in the latest release and/or you don't think you'll be modifying Solr at all, then the latest release is fine. One way to gauge what (completed) features are not yet in the latest official release is to visit Solr's JIRA at http://issues.apache.org/jira/browse/SOLR, and then click on Roadmap. Also, the Wiki at http://wiki.apache.org/solr/ should have features that are not yet in the latest release version marked as such.
Tip
Choose to get Solr through source control even if you are going to stick with the last official release. When/if you make changes to Solr, it will then be easier to see what those differences are. Switching to a different release becomes much easier too.
We're going to get the code through a subversion and check out the trunk
(a source control term for the latest code). If you are using an IDE or some GUI tool for subversion, then feel free to use that. The command line will suffice too. You should be able to successfully execute the following:
svn co http://svn.apache.org/repos/asf/lucene/solr/trunk/ solr_svn
That will result in Solr being checked out into the solr_svn
directory. If you prefer one of the official releases, then use one of the following URLs, instead of the one above: http://svn.apache.org/repos/asf/lucene/solr/tags/ (put that into your web browser to see the choices). So called nightlies are also available if you don't want to use a subversion but want recent code.
If you prefer a downloadable pre-built Solr, instead of using a subversion, then you can skip this section.
Tip
Ant basics
Apache ant is a cross-platform build scripting tool specified with XML. It is largely Java oriented. An ant script is assumed to be named build.xml
in the root of a project. It contains a set of named ant targets
that you can run. In order to list them while including description, type ant -p
to get a nice report. In order to run a target, simply supply it to ant as the first argument such as ant compile
. Targets often internally invoke other targets, and you'll see this in the output. In the end, ant should report BUILD SUCCESSFUL if successful and BUILD FAILED if not. Note that ant's use of the term 'build' is universal in ant, even if 'build' is not an apt description of what a target performed.
Testing and building Solr is easy. Before we build Solr, we're going to test it first to ensure that there are no failing tests. Simply execute the test
target in Solr's installation directory like ant
test
. That should have executed without any errors. On my old machine, it took about ten minutes to run. If there were errors (extremely rare), then you'll have to switch to a different version or wait shortly for it to be fixed. Now to build a ready-to-install Solr, just type ant
dist
. This is going to fill the dist
directory with some JAR files and a WAR file. If you are not familiar with Java, these files are a packaging mechanism for compiled code and related resources. These files are technically ZIP files but with a different file extension, and so you can use any ZIP file tools to view their contents. The most important one is the WAR file which we'll be using next.
In this section, we'll orient you to Solr's directory structure. This is not Solr's home directory, but a different place that we'll mention after this.
build
: Only appears after Solr is built to house compiled code before being packaged. You won't need to look in here.client
: Contains convenient language-specific APIs for talking to Solr as an alternative to using your own code to send XML over HTTP. As of this writing, this only contains a couple of Ruby choices. The Java client called SolrJ is actually insrc
/solrj
. More information on using clients to communicate with Solr is in Chapter 8.dist
: The built Solr JAR files and WAR file are here, as well as the dependencies. This directory is created and filled when Solr is built.example
: This is an installation of the Jetty servlet engine (a Java web server) including some sample data and Solr configuration. The interesting child directories are:example
/etc
: Jetty's configuration. Among other things, here you can change the web port used from the pre-supplied 8983 to 80 (HTTP default).example
/multicore
: Houses multiple Solr home directories in a Solrmulticore
setup. This will be discussed in Chapter 7.example
/solr
: A Solrhome
directory for the default setup that we'll be using.
lib
: All of Solr's API dependencies. The larger pieces are Lucene, some Apache commons utilities, and Stax for efficient XML processing.site
: This is for managing what is published on the Solr web site. You won't need to go in here.src
: Various source code. It's broken down into a few notable directories:src
/java
: Solr's source code, written in Java.src
/scripts
: Unix bash shell scripts, particularly useful in larger production deployments employing multiple Solr servers.src
/solrj
: Solr's Java client.src
/webapp
: Solr's web administration interface, including Java Servlets (source code form) and JSPs. This is mostly what constitutes the WAR file. The JSPs for the admin interface are under here inweb/admin/
, if you care to tweak any to your needs.
If you are a Java developer, you may have noticed that the Java source in Solr is not located in one place. It's in src
/java
for the majority of Solr, src
/common
for the parts of Solr that are common to both the server side and Solrj client side, src/test
for the test code, and src
/webapp
/src
for the servlet-specific code. I am merely pointing this out to help you find code, not to be critical. Solr's files are well organized.
A Solr home directory contains Solr's configuration and data (a Lucene Index) for a running Solr instance. Solr includes a sample, one at example
/solr
, which we'll be using in-place throughout most of the book. Technically, example
/multicore
is also a valid Solr home but for a multi-core setup, which will be discussed much later. You know you're looking at a Solr home directory when it contains either a solr.xml
file (formerly multicore.xml
in Solr 1.3), or if it contains both a conf
and a data
directory, though strictly speaking these might not be the actual requirements.
Note
data
might not yet be present because you haven't started Solr yet, which will create it if it's not present and assuming it's not configured to be named differently.
Solr's home directory is laid out like this:
bin
: Suggested directory to place Solr replication scripts, if you have a more advanced setup.conf
: Configuration files. The two I mention below are very important, but it will also contain some other.txt
and.xml
files, which are referenced by these two files for different things such as special text analysis steps.conf
/schema.xml
: This is the schema for the index including field type definitions with associated analyzer chains.conf
/solrconfig.xml
: This is the primary Solr configuration file.conf
/xslt
: This directory contains various XSLT files that can be used to transform Solr's XML query responses into formats such as Atom/RSS.data
: Contains the actual Lucene index data. It's binary data, so you won't be doing anything with it except perhaps deleting it occasionally.lib
: Optional placement of extra Java JAR files that Solr will load on startup, allowing you to externalize plugins from the Solr distribution (the WAR file) for convenience. If you extend Solr without modifying Solr itself, then those modifications can be deployed in a JAR file here.
It's really important to know how Solr finds its home directory. This is covered next.
In the next section, you'll start Solr. When Solr starts up, about the first thing it does is load its configuration from its home directory. Where that is exactly can be specified in several different ways.
Solr first checks for a Java system property named solr.solr.home
. There are a few ways to set a Java system property, but a universal one, no matter which servlet engine you use, is through the command line where Java is invoked. You could explicitly set Solr's home like so when you start Jetty: java
-Dsolr.solr.home=solr/
-jar start.jar
, or you could use Java Naming and Directory Interface (JNDI) to bind the directory path to java:comp/env/solr/home
. As with Java system properties, there are multiple ways to do this. Some are app-server dependent, but a universal one is to add the following to the WAR file's web.xml
located in src
/web-app
/web
/WEB-INF
(you'll find this there already but commented out).
<env-entry> <env-entry-name>solr/home</env-entry-name> <env-entry-value>solr/</env-entry-value> <env-entry-type>java.lang.String</env-entry-type> </env-entry>
As this is a change to web.xml
, you'll need to re-run ant
dist-war
to repackage it, and only then you'll redeploy it. Doing this with Jetty supplied with Solr is insufficient because JNDI itself isn't set up. I'm not going to get into this further, because if you know what JNDI is and want to use it, then you'll surely figure out how to do it for your particular app-server.
Finally, if Solr's home isn't configured as a Java system property or through JNDI, then it defaults to solr/
. In the examples above, I used that particular path too. We're going to simply stick with this path for the rest of this book, because this is a development, not production, setting.
Tip
In a production environment, you will almost certainly configure Solr's home rather than let it fall back to the default solr/
. You will also probably use an absolute path instead of a relative one, which wouldn't work if you accidentally start your app-server from a different directory.
When troubleshooting setting Solr's home, be sure to look at the very first Solr log messages when Solr starts:
Aug 7, 2008 4:59:35 PM org.apache.solr.core.Config getInstanceDir
INFO: Solr home defaulted to 'null' (could not find system property or JNDI)
Aug 7, 2008 4:59:35 PM org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to 'solr/'
This shows that Solr was left to default to solr/
. You'll see this output when you start Solr, as described in the next section.
The file we're going to deploy is the file ending in .war
in the dist
directory (dist
/apache-solr-1.4.war
). The WAR file in particular is important, because this single file represents an entire Java web application. It includes Solr's JAR file, all of Solr's dependencies (which amount to other JAR files), Java Server Pages (JSPs) (which are rendered to a web browser when the WAR is deployed), and various configuration files and other web resources. It does not include Solr's home directory, however.
How one deploys a WAR file to a Java servlet engine depends on that servlet engine, but it is common for there to be a directory named something like webapps
, which contains WAR files optionally in an expanded form. By expanded, I mean that the WAR file may be uncompressed and thus a directory by the same name. This can be a convenient deployed form in order to make changes in-place (such as to JSP files and static web files) without requiring rebuilding a WAR file and replacing an existing one. The disadvantage is that changes are not directly tracked by source control (example: Subversion). Another thing to note about the WAR file is that by convention, its name (without the .war
extension, if present) is the path portion of the URL where the web server mounts the web application. For example, if you have an apache-solr-1.4.war
file, then you would access it at http://localhost:8983/apache-solr-1.4/
, assuming it's on the local machine and running at that default port.
We're going to deploy this WAR file into the Jetty servlet engine included with Solr. If you are using a pre-built downloaded Solr distribution, then Solr is already deployed into Jetty as solr.war
. Solr has an ant target that does this (and some other things we don't care about) called example
, so you can simply run it like ant
example
. This target didn't keep the original WAR filename when copying it. It abbreviated it to simply solr.war
. This means that the URL path is just solr
. By the way, because ant targets generally call other necessary ant targets, it was technically not necessary to run ant
dist
earlier in order for this step to work. This would not have run the tests, however.
Now we're going to start up Jetty and finally see Solr running (albeit without any data to query yet). First go to the example
directory, and then run Jetty's start.jar
file by typing the following command:
cd example
java -jar start.jar
You'll see about a page of output including references to Solr. When it is finished, you should see this output at the very end of the command prompt:
2008-08-07 14:10:50.516::INFO: Started SocketConnector @ 0.0.0.0:8983
The 0.0.0.0
means it's listening to connections from any host (not just localhost, notwithstanding potential firewalls) and 8983 is the port. If Jetty reports this, then it doesn't necessarily mean that Solr was deployed successfully. You might see an error such as a stack trace in the output, if something went wrong. Even if it did go wrong, you should be able to access the web server at this address: http://localhost:8983
. It will show you a list of links to web applications which will just be Solr for this setup. Solr should have this link: http://localhost:8983/solr
, and if you go there, then you should either see details about an error if Solr wasn't loaded correctly, or a simple page with a link to Solr's admin page, which should be http://localhost:8983/solr/admin/
. You'll be visiting that link often.