Book Image

Apache Solr Enterprise Search Server - Third Edition

By : David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell
Book Image

Apache Solr Enterprise Search Server - Third Edition

By: David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell

Overview of this book

<p>Solr Apache is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.</p> <p>This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.</p>
Table of Contents (19 chapters)
Apache Solr Enterprise Search Server Third Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Configuration files


When you start up Solr using the –e techproducts parameter, it loads the configuration files from /server/solr/configsets/sample_techproducts_configs. These configuration files are extremely well documented.

A Solr core's instance directory is laid out like this:

  • conf: This directory contains configuration files. The solrconfig.xml and schema.xml files are most important, but it will also contain some other .txt and .xml files, which are referenced by these two.

  • conf/schema.xml: This is the schema for the index, including field type definitions with associated analyzer chains.

  • conf/solrconfig.xml: This is the primary Solr configuration file.

  • conf/lang: This directory contains language translation .txt files that is used by several components.

  • conf/xslt: This directory contains various XSLT files that can be used to transform Solr's XML query responses into formats such as Atom and RSS. See Chapter 9, Integrating Solr.

  • conf/velocity: This includes the HTML templates and related web assets for rapid UI prototyping using Solritas, covered in Chapter 9, Integrating Solr. The previously discussed browse UI is implemented with these templates.

  • lib: Where extra Java JAR files can be placed that Solr will load on startup. This is a good place to put contrib JAR files, and their dependencies. You'll need to create this directory on your own, though; it doesn't exist by default.

    Note

    Unlike typical database software, in which the configuration files don't need to be modified much (if at all) from their defaults, you will modify Solr's configuration files extensively—especially the schema. The as-provided state of these files is really just an example to both demonstrate features and document their configuration and should not be taken as the only way of configuring Solr. It should also be noted that in order for Solr to recognize configuration changes, a core must be reloaded (or simply restart Solr).

Solr's schema for the index is defined in schema.xml. It contains the index's fields within the <fields> element and then the field type definitions within the <types> element. You will observe that the names of the fields in the documents we added to Solr intuitively correspond to the sample schema. Aside from the fundamental parts of defining the fields, you might also notice the <copyField> elements, which copy an input field as provided to another field. There are various reasons for doing this, but they boil down to needing to index data in different ways for specific search purposes. You'll learn all that you could want to know about the schema in the next chapter.

Each Solr core's solrconfig.xml file contains lots of parameters that can be tweaked. At the moment, we're just going to take a peek at the request handlers, which are defined with the <requestHandler> elements. They make up about half of the file. In our first query, we didn't specify any request handler, so we got the default one:

<requestHandler name="/select" class="solr.SearchHandler>
  <!-- default values for query parameters can be specified, these will be overridden by parameters in the request
 -->
  <lst name="defaults">
    <str name="echoParams">explicit</str>
    <int name="rows">10</int>
    <str name="df">text</str>
  </lst>
  <!-- … many other comments … -->
</requestHandler>

Each HTTP request to Solr, including posting documents and searches, goes through a particular request handler. Handlers can be registered against certain URL paths by naming them with a leading /. When we uploaded the documents earlier, it went to the handler defined like this, in which /update is a relative URL path:

<requestHandler name="/update" class="solr.UpdateRequestHandler" />

Requests to Solr are nearly completely configurable through URL parameters or POST'ed form parameters. They can also be specified in the request handler definition within the <lst name="defaults"> element, such as how rows is set to 10 in the previously shown request handler. The well-documented file also explains how and when they can be added to appends, or invariants named lst blocks. This arrangement allows you to set up a request handler for a particular application that will be searching Solr, without forcing the application to specify all of its search parameters. More information on request handlers can be found in Chapter 5, Searching.