Book Image

Apache Solr Enterprise Search Server - Third Edition

By : David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell
Book Image

Apache Solr Enterprise Search Server - Third Edition

By: David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell

Overview of this book

<p>Solr Apache is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.</p> <p>This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.</p>
Table of Contents (19 chapters)
Apache Solr Enterprise Search Server Third Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

If you are a developer building an application today, then you know how important a good search experience is. Apache Solr, built on Apache Lucene, is a wildly popular open source enterprise search server that easily delivers the powerful search and faceted navigation features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spellcheck, relevancy tuning, and more.

Apache Solr Enterprise Search Server, Third Edition is a comprehensive resource to almost everything Solr has to offer. It serves the reader right from initiation to development to deployment. It also comes with complete running examples to demonstrate its use and show how to integrate Solr with other languages and frameworks—even Hadoop.

By using a large set of metadata, including artists, releases, and tracks, courtesy of the MusicBrainz.org project, you will have a testing ground for Solr and will learn how to import this data in various ways. You will then learn how to search this data in different ways, including Solr's rich query syntax and boosting match scores based on record data. Finally, we'll cover various deployment considerations to include indexing strategies and performance-oriented configuration that will enable you to scale Solr to meet the needs of a high-volume site.

Note

Solr 4 or Solr 5?

See the What you need for this book section further below.

What this book covers

Chapter 1, Quick Starting Solr, introduces Solr to you so that you understand its unique role in your application stack. You'll get started quickly by indexing example data and searching it with Solr's sample / browse UI. This chapter is oriented to Solr 5, but the majority of content applies to Solr 4 too.

Chapter 2, Schema Design, guides you through an approach to modeling your data within Solr into one or more Solr indices and schemas. It covers the schema thoroughly and explores some of Solr's field types.

Chapter 3, Text Analysis, covers how to customize text tokenization, stemming, synonyms, and related matters to have fine control over keyword search matching. It also covers multilingual strategies.

Chapter 4, Indexing Data, explores all of the options Solr offers for importing data, such as XML, CSV, databases (SQL), and text extraction from common documents. This includes important information on commits, atomic updates, and real-time search.

Chapter 5, Searching, covers the query syntax, from the basics to Boolean options to more advanced wildcard and fuzzy searches, join queries, and geospatial search.

Chapter 6, Search Relevancy, explains how Solr scores documents for relevancy ranking. We'll review different options to influence the score, called boosting, and apply it to common examples such as boosting recent documents and boosting by a user vote.

Chapter 7, Faceting, shows you how to use Solr's killer feature—faceting. You'll learn about the different types of facets and how to build filter queries for a faceted navigation interface.

Chapter 8, Search Components, explores how to use a variety of valuable search features implemented as Solr search components. This includes result highlighting, query spellcheck, query suggest / complete, result grouping / collapsing, and more.

Chapter 9, Integrating Solr, explores some external integration options to interface with Solr. This includes some language-specific frameworks for Java, JavaScript, Ruby, and PHP, as well as a web crawler, Hadoop, a quick prototyping option, and more.

Chapter 10, Scaling Solr, covers how to tune Solr to get the most out of it. Then we'll introduce how to scale beyond one instance with SolrCloud.

Chapter 11, Deployment, guides you through deployment considerations to include deploying Solr to Apache Tomcat, to logging, and to security, and setting up Apache ZooKeeper.

Appendix, Quick Reference, serves as a small parameter quick-reference guide you can print to have within reach when you need it.

What you need for this book

The Getting started section in Chapter 1, Quick Starting Solr, explains what you need in detail. In summary, you should obtain:

  • Java 8, a JDK release. Java 7 is fine too. Support for Java 6 was last available in Solr 4.7. More information on this is in Chapter 1, Quick Starting Solr.

  • Apache Solr 4.8.1 is officially the version of Solr this book was written for. Nonetheless, some of the features are discussed or referenced in the later versions of Solr as far as 5.0. In fact, Chapter 1, Quick Starting Solr, orients you to Solr 5, which has a different first-impression experience than its predecessor. Once you get Solr running, you should be able to follow along easily with Solr 5. In Chapter 10, Scaling Solr, there are some SolrCloud startup commands that are a little different, and we've pointed out how they change. The only substantial topic not covered in this book that evolved through the Solr 4 point releases is data-driven schemaless mode, and HTTP API calls to make schema changes.

  • The code supplement to the book. It's not essential, but you'll want it to try some of the examples or to experiment with a sizable amount of real data. See the Downloading the example code section.

Who this book is for

This book is primarily for developers who want to learn how to use Apache Solr in their applications. Only basic programming skills are assumed, although the vast majority of content should be useful to those with a solid technical foundation that have not yet programmed.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Typing java –version at a command line will tell you exactly which version of Java you are using, if any."

A block of code is set as follows:

"responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "q": "lcd",
      "indent": "true",
      "wt": "json"
    }
  }
…

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

{
        "id": "9885A004",
        "name": "Canon PowerShot SD500",
        "manu": "Canon Inc.",
        "manu_id_s": "canon",
        "cat": [
          "electronics",
          "camera"
        ],
        "features": [
          "3x zoop, 7.1 megapixel Digital ELPH",
          "movie clips up to 640x480 @30 fps",
          "2.0\" TFT LCD, 118,000 pixels",
          "built in flash, red-eye reduction"
        ],
        "includes": "32MB SD card, USB cable, AV cable, battery",
        "weight": 6.4,
        "price": 329.95,
        "price_c": "329.95,USD",
        "popularity": 7,
        "inStock": true,
        "manufacturedate_dt": "2006-02-13T15:26:37Z",
        "store": "45.19614,-93.90341",
        "_version_": 1500358264225792000
      },
...

Any command-line input or output is written as follows:

>> cd example/exampledocs
>> java –Dc=techproducts -jar post.jar *.xml
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/techproducts/
update using
content-type application/xml...
POSTing file gb18030-example.xml
POSTing file hd.xml
etc.
14 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/techproducts/update...

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Click on the Core Selector drop-down menu and select the techproducts link."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

A copy of the code bundle and possibly other information will also be available at http://www.solrenterprisesearchserver.com.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.