Book Image

Apache Solr 4 Cookbook

By : Rafał Kuć
Book Image

Apache Solr 4 Cookbook

By: Rafał Kuć

Overview of this book

<p>Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features.<br /><br />"Apache Solr 4 Cookbook" will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.<br /><br />"Apache Solr 4 Cookbook" will make your search better, more accurate and faster with practical recipes on essential topics such as SolrCloud, querying data, search faceting, text and data analysis, and cache configuration.<br /><br />With numerous practical chapters centered on important Solr techniques and methods, Apache Solr 4 Cookbook is an essential resource for developers who wish to take their knowledge and skills further. Thoroughly updated and improved, this Cookbook also covers the changes in Apache Solr 4 including the awesome capabilities of SolrCloud.</p>
Table of Contents (18 chapters)
Apache Solr 4 Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

Running Solr on Jetty


The simplest way to run Apache Solr on a Jetty servlet container is to run the provided example configuration based on embedded Jetty. But it's not the case here. In this recipe, I would like to show you how to configure and run Solr on a standalone Jetty container.

Getting ready

First of all you need to download the Jetty servlet container for your platform. You can get your download package from an automatic installer (such as, apt-get), or you can download it yourself from http://jetty.codehaus.org/jetty/.

How to do it...

The first thing is to install the Jetty servlet container, which is beyond the scope of this book, so we will assume that you have Jetty installed in the /usr/share/jetty directory or you copied the Jetty files to that directory.

Let's start by copying the solr.war file to the webapps directory of the Jetty installation (so the whole path would be /usr/share/jetty/webapps). In addition to that we need to create a temporary directory in Jetty installation, so let's create the temp directory in the Jetty installation directory.

Next we need to copy and adjust the solr.xml file from the context directory of the Solr example distribution to the context directory of the Jetty installation. The final file contents should look like the following code:

<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "http://www.eclipse.org/jetty/configure.dtd">
<Configure class="org.eclipse.jetty.webapp.WebAppContext">
  <Set name="contextPath">/solr</Set>
  <Set name="war"><SystemProperty name="jetty.home"/>/webapps/solr.war</Set>
  <Set name="defaultsDescriptor"><SystemProperty name="jetty.home"/>/etc/webdefault.xml</Set>
  <Set name="tempDirectory"><Property name="jetty.home" default="."/>/temp</Set>
</Configure>

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Now we need to copy the jetty.xml, webdefault.xml, and logging.properties files from the etc directory of the Solr distribution to the configuration directory of Jetty, so in our case to the /usr/share/jetty/etc directory.

The next step is to copy the Solr configuration files to the appropriate directory. I'm talking about files such as schema.xml, solrconfig.xml, solr.xml, and so on. Those files should be in the directory specified by the solr.solr.home system variable (in my case this was the /usr/share/solr directory). Please remember to preserve the directory structure you'll see in the example deployment, so for example, the /usr/share/solr directory should contain the solr.xml (and in addition zoo.cfg in case you want to use SolrCloud) file with the contents like so:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
  <cores adminPath="/admin/cores" defaultCoreName="collection1">
    <core name="collection1" instanceDir="collection1" />
  </cores>
</solr> 

All the other configuration files should go to the /usr/share/solr/collection1/conf directory (place the schema.xml and solrconfig.xml files there along with any additional configuration files your deployment needs). Your cores may have other names than the default collection1, so please be aware of that.

The last thing about the configuration is to update the /etc/default/jetty file and add –Dsolr.solr.home=/usr/share/solr to the JAVA_OPTIONS variable of that file. The whole line with that variable could look like the following:

JAVA_OPTIONS="-Xmx256m -Djava.awt.headless=true -Dsolr.solr.home=/usr/share/solr/" 

If you didn't install Jetty with apt-get or a similar software, you may not have the /etc/default/jetty file. In that case, add the –Dsolr.solr.home=/usr/share/solr parameter to the Jetty startup.

We can now run Jetty to see if everything is ok. To start Jetty, that was installed, for example, using the apt-get command, use the following command:

/etc/init.d/jetty start

You can also run Jetty with a java command. Run the following command in the Jetty installation directory:

java –Dsolr.solr.home=/usr/share/solr –jar start.jar

If there were no exceptions during the startup, we have a running Jetty with Solr deployed and configured. To check if Solr is running, try going to the following address with your web browser: http://localhost:8983/solr/.

You should see the Solr front page with cores, or a single core, mentioned. Congratulations! You just successfully installed, configured, and ran the Jetty servlet container with Solr deployed.

How it works...

For the purpose of this recipe, I assumed that we needed a single core installation with only schema.xml and solrconfig.xml configuration files. Multicore installation is very similar – it differs only in terms of the Solr configuration files.

The first thing we did was copy the solr.war file and create the temp directory. The WAR file is the actual Solr web application. The temp directory will be used by Jetty to unpack the WAR file.

The solr.xml file we placed in the context directory enables Jetty to define the context for the Solr web application. As you can see in its contents, we set the context to be /solr, so our Solr application will be available under http://localhost:8983/solr/. We also specified where Jetty should look for the WAR file (the war property), where the web application descriptor file (the defaultsDescriptor property) is, and finally where the temporary directory will be located (the tempDirectory property).

The next step is to provide configuration files for the Solr web application. Those files should be in the directory specified by the system solr.solr.home variable. I decided to use the /usr/share/solr directory to ensure that I'll be able to update Jetty without the need of overriding or deleting the Solr configuration files. When copying the Solr configuration files, you should remember to include all the files and the exact directory structure that Solr needs. So in the directory specified by the solr.solr.home variable, the solr.xml file should be available – the one that describes the cores of your system.

The solr.xml file is pretty simple – there should be the root element called solr. Inside it there should be a cores tag (with the adminPath variable set to the address where Solr's cores administration API is available and the defaultCoreName attribute that says which is the default core). The cores tag is a parent for cores definition – each core should have its own cores tag with name attribute specifying the core name and the instanceDir attribute specifying the directory where the core specific files will be available (such as the conf directory).

If you installed Jetty with the apt-get command or similar, you will need to update the /etc/default/jetty file to include the solr.solr.home variable for Solr to be able to see its configuration directory.

After all those steps we are ready to launch Jetty. If you installed Jetty with apt-get or a similar software, you can run Jetty with the first command shown in the example. Otherwise you can run Jetty with a java command from the Jetty installation directory.

After running the example query in your web browser you should see the Solr front page as a single core. Congratulations! You just successfully configured and ran the Jetty servlet container with Solr deployed.

There's more...

There are a few tasks you can do to counter some problems when running Solr within the Jetty servlet container. Here are the most common ones that I encountered during my work.

I want Jetty to run on a different port

Sometimes it's necessary to run Jetty on a different port other than the default one. We have two ways to achieve that:

  • Adding an additional startup parameter, jetty.port. The startup command would look like the following command:

    java –Djetty.port=9999 –jar start.jar
    
  • Changing the jetty.xml file – to do that you need to change the following line:

    <Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>

    To:

    <Set name="port"><SystemProperty name="jetty.port" default="9999"/></Set>

Buffer size is too small

Buffer overflow is a common problem when our queries are getting too long and too complex, – for example, when we use many logical operators or long phrases. When the standard head buffer is not enough you can resize it to meet your needs. To do that, you add the following line to the Jetty connector in thejetty.xml file. Of course the value shown in the example can be changed to the one that you need:

<Set name="headerBufferSize">32768</Set>

After adding the value, the connector definition should look more or less like the following snippet:

<Call name="addConnector">
<Arg>
<New class="org.mortbay.jetty.bio.SocketConnector">
<Set name="port"><SystemProperty name="jetty.port" default="8080"/></Set>
<Set name="maxIdleTime">50000</Set>
<Set name="lowResourceMaxIdleTime">1500</Set>
<Set name="headerBufferSize">32768</Set>
</New>
</Arg>
</Call>