Solr 1.4 Enterprise Search Server

Solr 1.4 Enterprise Search Server

By : David Smiley, Eric Pugh

Buy this Book

Solr 1.4 Enterprise Search Server

By: David Smiley, Eric Pugh

Buy this Book

Overview of this book

If you are a developer building a high-traffic web site, you need to have a terrific search engine. Sites like Netflix.com and Zappos.com employ Solr, an open source enterprise search server, which uses and extends the Lucene search library. This is the first book in the market on Solr and it will show you how to optimize your web site for high volume web traffic with full-text search capabilities along with loads of customization options. So, let your users gain a terrific search experience. This book is a comprehensive reference guide for every feature Solr has to offer. It serves the reader right from initiation to development to deployment. It also comes with complete running examples to demonstrate its use and show how to integrate it with other languages and frameworks. This book first gives you a quick overview of Solr, and then gradually takes you from basic to advanced features that enhance your search. It starts off by discussing Solr and helping you understand how it fits into your architecture—where all databases and document/web crawlers fall short, and Solr shines. The main part of the book is a thorough exploration of nearly every feature that Solr offers. To keep this interesting and realistic, we use a large open source set of metadata about artists, releases, and tracks courtesy of the MusicBrainz.org project. Using this data as a testing ground for Solr, you will learn how to import this data in various ways from CSV to XML to database access. You will then learn how to search this data in a myriad of ways, including Solr's rich query syntax, "boosting" match scores based on record data and other means, about searching across multiple fields with different boosts, getting facets on the results, auto-complete user queries, spell-correcting searches, highlighting queried text in search results, and so on. After this thorough tour, we'll demonstrate working examples of integrating a variety of technologies with Solr such as Java, JavaScript, Drupal, Ruby, XSLT, PHP, and Python. Finally, we'll cover various deployment considerations to include indexing strategies and performance-oriented configuration that will enable you to scale Solr to meet the needs of a high-volume site.

Solr 1.4 Enterprise Search Server

Credits

About the Authors

About the Reviewers

Preface

Free Chapter

Quick Starting Solr

An introduction to Solr

Comparison to database technology

Getting started

A quick tour of Solr!

The schema and configuration files

Solr resources outside this book

Summary

Schema and Text Analysis

MusicBrainz.org

One combined index or multiple indices

Schema design

The schema.xml file

Text analysis

Summary

Indexing Data

Communicating with Solr

Using curl to interact with Solr

Remote streaming

Sending XML to Solr

Sending CSV to Solr

Direct database and XML import

Indexing documents with Solr Cell

Summary

Basic Searching

Your first search, a walk-through

Solr's generic XML structured data representation

Solr's XML response format

Sorting

Scoring

Summary

Enhanced Searching

Function queries

Dismax Solr request handler

Faceting

Summary

Search Components

About components

The highlighting component

Query elevation

Spell checking

The more-like-this search component

Stats component

Field collapsing

Other components

Summary

Deployment

Implementation methodology

Installing into a Servlet container

Logging

A SearchHandler per search interface

Solr cores

JMX

Securing Solr

Summary

Integrating Solr

Structure of included examples

SolrJ: Simple Java interface

Using JavaScript to integrate Solr

Accessing Solr from PHP applications

Ruby on Rails integrations

Summary

Scaling Solr

Tuning complex systems

Optimizing a single Solr server (Scale High)

Moving to multiple Solr servers (Scale Wide)

Combining replication and sharding (Scale Deep)

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

The schema and configuration files

Solr's configuration files are extremely well documented. We're not going to go over the details here but this should give you a sense of what is where.

The schema (defined in schema.xml) contains field type definitions (defined within the <types> tag) and lists the fields that make up your schema (within the <fields> tag), which references a type. The schema contains other information too such as the primary key (the field that uniquely identifies each document—a constraint that Solr enforces) and the default search field. The sample schema in Solr uses the field named text, confusingly, there is a field type named text too. But remember that the monitor.xml document we reviewed earlier had no field named text, right? It is common for the schema to call out for certain fields to be copied to other fields—particularly fields not in input documents. So, even though the input documents don't have a field named text, there are <copyField> tags in the schema, which call for the fields named cat, name, manu, features, and includes to be copied to text. This is a popular technique to speed up queries, so that queries can search over a small number of fields rather than a long list of them. Such fields used this way are rarely stored, as they are just needed for querying and so are indexed. There is a lot more we could talk about in the schema, but we're going to move on for now.

Solr's solrconfig.xml file contains lots of parameters that can be tweaked. At the moment, we're just going to take a peak at the request handlers that are defined with <requestHandler> tags. They make up about half of the file. In our first query, we didn't specify any request handler, so we got the default one. It's defined here:

<requestHandler name="standard" class="solr.SearchHandler" default="true">
<!-- default values for query parameters -->
  <lst name="defaults">
    <str name="echoParams">explicit</str>
    <!-- 
    <int name="rows">10</int>
    <str name="fl">*</str>
    <str name="version">2.1</str>
    -->
  </lst>
</requestHandler>

When you POST commands to Solr (such as to index a document) or query Solr (HTTP GET), it goes through a particular request handler. Handlers can be registered against certain URL paths. When we uploaded the documents earlier, it went to the handler defined like this:

<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />

The request handlers oriented to querying using the class solr.SearchHandler are much more interesting.

Note

The important thing to realize about using a request handler is that they are nearly completely configurable through URL parameters or POST'ed form parameters. They can also be specified in solrconfig.xml within either default, appends, or invariants named lst blocks, which serve to establish defaults. More on this is in Chapter 4. This arrangement allows you to set up a request handler for a particular application that will be querying Solr without forcing the application to specify all of its query options.

The standard request handler defined previously doesn't really define any defaults other than the parameters that are to be echoed in the response. Remember its presence at the top of the XML output? By changing explicit to none you can have it omitted, or use all and you'll potentially see more parameters, if other defaults happened to be configured in the request handler. This parameter can alternatively be specified in the URL through echoParams=none. Remember to separate URL parameters with ampersands.

Solr 1.4 Enterprise Search Server

By : David Smiley, Eric Pugh

Solr 1.4 Enterprise Search Server

By: David Smiley, Eric Pugh

Overview of this book

Related Content you might be interested in

Current Title:

Solr 1.4 Enterprise Search Server

The schema and configuration files

Note