Book Image

Solr Cookbook - Third Edition

By : Rafal Kuc
Book Image

Solr Cookbook - Third Edition

By: Rafal Kuc

Overview of this book

Table of Contents (18 chapters)
Solr Cookbook Third Edition
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Running Solr on a standalone Jetty


The simplest way to run Apache Solr on the Jetty servlet container is to run the provided example configuration based on an embedded Jetty. This is very simple if you use the provided example deployment. However, it is not suited for production deployment, where you will have the standalone Jetty installed. In this recipe, I will show you how to configure and run Solr on a standalone Jetty container.

Getting ready

First, you need to download the Jetty servlet container for your platform. You can get your download package from an automatic installer, such as apt-get, or you can download it from http://download.eclipse.org/jetty/. In addition to this, read the Using core discovery recipe of this chapter for more information.

Tip

While writing this recipe, I used Solr Version 4.10 and Jetty Version 8.1.10. Solr 5.0 will stop providing the WAR file for deployment on the external web application container and will be ready for installation as it is.

How to do it...

The first step is to install the Jetty servlet container, which is beyond the scope of this book, so we will assume that you have Jetty installed in the /usr/share/jetty directory.

  1. Let's start with copying the solr.war file to the webapps directory of the installed Jetty (so that the whole path is /usr/share/jetty/webapps). In addition to this, we need to create a temporary directory in the installed Jetty, so let's create the tmp directory in the Jetty installation directory.

  2. Next, we need to copy and adjust the solr-jetty-context.xml file from the contexts directory of the Solr example distribution to the contexts directory of the installed Jetty. The final file contents should look like this:

    <?xml version="1.0"?>
    <!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "http://www.eclipse.org/jetty/configure.dtd">
    <Configure class="org.eclipse.jetty.webapp.WebAppContext">
     <Set name="contextPath"><SystemProperty name="hostContext" default="/solr"/></Set>
     <Set name="war"><SystemProperty name="jetty.home"/>/webapps/solr.war</Set>
     <Set name="defaultsDescriptor"><SystemProperty name="jetty.home"/>/etc/webdefault.xml</Set>
     <Set name="tempDirectory"><Property name="jetty.home" default="."/>/tmp</Set>
    </Configure>
  3. Now, we need to copy the jetty.xml and webdefault.xml files from the etc directory of the Solr distribution to the configuration directory of Jetty; in our case, to the /usr/share/jetty/etc directory.

  4. The next step is to copy the Solr core (https://wiki.apache.org/solr/SolrTerminology) configuration files to the appropriate directory. I'm talking about files such as schema.xml, solrconfig.xml, and so forth—the files that can be found in the solr/collection1/conf directory of the example Solr distribution. These files should be put in the core_name/conf directory inside a folder specified by the solr.solr.home system variable (in my case, this is the /usr/share/solr directory). For example, if we want our core to be named example_data, we should put the mentioned configuration files in the /usr/share/solr/example_data/conf directory.

  5. In addition to this, we need to put the core.properties file in the /usr/share/solr/example_data directory. The file should be very simple and contain the single property, name, with the value of the name of the core, which in our case should look like the following:

    name=example_data
  6. The next step is optional and is only needed for SolrCloud deployments. For such deployments, we need to create the zoo.cfg file in the /usr/share/solr/ directory with the following contents:

    tickTime=2000
    initLimit=10
    syncLimit=5
  7. The final configuration file we need to create is the solr.xml file, which should be put in the /usr/share/solr/ directory. The contents of the file should look like this:

    <?xml version="1.0" encoding="UTF-8" ?>
    <solr>
     <solrcloud>
      <str name="host">${host:}</str>
      <int name="hostPort">${jetty.port:8983}</int>
      <str name="hostContext">${hostContext:solr}</str>
      <int name="zkClientTimeout">${zkClientTimeout:30000}</int>
      <bool name="genericCoreNodeNames">
                 ${genericCoreNodeNames:true}</bool>
     </solrcloud>
     <shardHandlerFactory name="shardHandlerFactory"
                 class="HttpShardHandlerFactory">
      <int name="socketTimeout">${socketTimeout:0}</int>
      <int name="connTimeout">${connTimeout:0}</int>
     </shardHandlerFactory>
    </solr>
  8. The final step is to include the solr.solr.home property in the Jetty startup file. If you have installed Jetty using software such as apt-get, then you need to update the /etc/default/jetty file and add the –Dsolr.solr.home=/usr/share/solr parameter to the JAVA_OPTIONS variable of the file. The whole line with this variable will look like this:

    JAVA_OPTIONS="-Xmx256m -Djava.awt.headless=true -Dsolr.solr.home=/usr/share/solr/" 

    Note

    If you didn't install Jetty with apt-get or a similar software, you might not have the /etc/default/jetty file. In this case, add the –Dsolr.solr.home=/usr/share/solr parameter to the Jetty startup file.

We can now run Jetty to see if everything is okay. To start Jetty, which was already installed, use the apt-get command, as shown:

/etc/init.d/jetty start

If there are no exceptions during startup, we have a running Jetty with Solr deployed and configured. To check whether Solr is running, visit http://localhost:8983/solr/.

Congratulations, you have just successfully installed, configured, and run the Jetty servlet container with Solr deployed.

How it works...

For the purpose of this recipe, I assumed that we needed a single core installation with only the schema.xml and solrconfig.xml configuration files. Multicore installation is very similar; it differs only in terms of the Solr configuration files—one needs more than a single core defined.

The first thing we did was copied the solr.war file and created the tmp directory. The WAR file is the actual Solr web application. The tmp directory will be used by Jetty to unpack the WAR file.

The solr-jetty-context.xml file that we place in the context directory allows Jetty to define the context for a Solr web application. As you can see in its contents, we have set the context to be /solr, so our Solr application will be available under http://localhost:8983/solr/. We also need to specify where Jetty should look for the WAR file (the war property), where the web application descriptor file (the defaultsDescriptor property) is, and finally, where the temporary directory will be located (the tempDirectory property).

Copying the jetty.xml and webdefault.xml files is important. The standard Solr distribution comes with Jetty configuration files prepared for high load; for example, we can avoid the distributed deadlock.

The next step is to provide configuration files for the Solr core. These files should be put in the core_name/conf directory, which is created in a folder specified by the system's solr.solr.home variable. Since our core is named example_data, and the solr.solr.home property points to /usr/share/solr, we place our configuration files in the /usr/share/solr/example_data/conf directory. Note that I decided to use the /usr/share/solr directory as the base directory for all Solr configuration files. This ensures the ability to update Jetty without the need to override or delete the Solr configuration files.

The core.properties file allows Solr to identify the core that it will try to load. By providing the name property, we tell Solr what name the core should have. In our case, its name will be example_data.

The zoo.cfg file is optional, is only needed when setting up SolrCloud, and is used by Solr to specify ZooKeeper client properties. The tickTime property specifies the number of milliseconds of each tick. The tick is the unit of time in ZooKeeper client connections. The initLimit property specifies the number of ticks the initial synchronization phase can take, and the syncLimit property specifies the number of ticks that can pass between sending a request and getting an acknowledgement. For example, because the syncLimit property is set to 5 and tickTime is 2000, the maximum time between sending the request and getting the acknowledgement is 10,000 milliseconds (syncLimit multiplied by tickTime).

The solr.xml file is described in the Using core discovery recipe in this chapter.

If you installed Jetty with the apt-get command or a similar software, then you need to update the /etc/default/jetty file to include the solr.solr.home variable for Solr to be able to see its configuration directory.

After all these steps, we will be ready to launch Jetty. If you installed Jetty with apt-get or similar software, you can run Jetty with the first command shown in the example. Otherwise, you can run Jetty with the java -jar start command from the Jetty installation directory.

After running the example query in your web browser, you should see the Solr front page as a single core. Congratulations, you have successfully configured and run the Jetty servlet container with Solr deployed.

There's more...

There are a few more tasks that you can perform to counter some problems while running Solr within the Jetty servlet container. The most common tasks that I encountered during my work are described in the ensuing sections.

I want Jetty to run on a different port

Sometimes, it's necessary to run Jetty on a port other than the default one. We have two ways to achieve this:

  • Add an additional start up parameter, jetty.port. The startup command looks like this:

    java –Djetty.port=9999 –jar start.jar
    
  • Change the jetty.xml file to do what you need to change the following line:

    <Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>

    The line should be changed to a port that we want Jetty to listen to requests from:

    <Set name="port"><SystemProperty name="jetty.port" default="9999"/></Set>

Buffer size is too small

Buffer overflow is a common problem when our queries get too long and too complex, for example, when using many logical operators or long phrases. When the standard HEAD buffer is not enough, you can resize it to meet your needs. To do this, add the following line to the Jetty connector in the jetty.xml file, which will specify the size of the buffer in bytes. Of course, the value shown in the example can be changed to the one that you need:

<Set name="requestHeaderSize">32768</Set>

After adding the value, the connector definition should look more or less like this:

<Call name="addConnector">
 <Arg>
  <New class="org.mortbay.jetty.bio.SocketConnector">
   <Set name="port"><SystemProperty name="jetty.port"  
      default="8080"/></Set>
   <Set name="maxIdleTime">50000</Set>
   <Set name="lowResourceMaxIdleTime">1500</Set>
   <Set name="requestHeaderSize">32768</Set>
  </New>
 </Arg>
</Call>