Book Image

Apache Solr Beginner's Guide

By : Alfredo Serafini
Book Image

Apache Solr Beginner's Guide

By: Alfredo Serafini

Overview of this book

<p>With over 40 billion web pages, the importance of optimizing a search engine's performance is essential.<br /><br />Solr is an open source enterprise search platform from the Apache Lucene project. Full-text search, faceted search, hit highlighting, dynamic clustering, database integration, and rich document handling are just some of its many features. Solr is highly scalable thanks to its distributed search and index replication.<br /><br />Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable with most popular programming languages. Solr's powerful external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.<br /><br />With Apache Solr Beginner's Guide you will learn how to configure your own search engine experience. Using real data as an example, you will have the chance to start writing step-by-step, simple, real-world configurations and understand when and where to adopt this technology.<br /><br />Apache Solr Beginner's Guide will start by letting you explore a simple search over real data. You will then go through a step-by-step description that gives you the chance to explore several practical features. At the end of the book you will see how Solr is used in different real-world contexts.<br /><br />Using data from public domains like DBpedia, you will define several different configurations, exploring some of the most interesting Solr features, such as faceted search and navigation, auto-suggestion, and rich document indexing. You will see how to configure different analysers for handling different data types, without programming.<br /><br />You will learn the basics of Solr, focusing on real-world examples and practical configurations.</p>
Table of Contents (19 chapters)
Apache Solr Beginner's Guide
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

How will we use Solr?


The main focus of this book will be a gradual introduction to Solr that can be used by a beginner without too much code at the beginning, even if we will introduce some coding near the end. In this approach, I hope that you'll find the chance to share what you read and your ideas with your teammates also if you want, and hopefully you'll have the freedom to find your own way of adopting this technology.

I would also like to suggest the adoption of Solr at an earlier stage of development, as a prototype tool. We will see that indexing data is easy; it doesn't matter if we do not have a final design for our data model yet. Hence, providing filters and faceting capabilities that can be adopted at the beginning of the user experience design. A Solr configuration can be improved at every stage of an incremental development (not necessarily when all the actual data already exists, as you might think), without "breaking" functionalities and giving us a fast view of the data that is near to the user perspective. This can be useful to construct a working preview for our customers, which is flexible enough to be improved fast later.

In order to use the scripts available in the repository for the book examples that we will use in the next chapters, we have defined a SOLR_DIST environment variable that will be available for some useful scripts you will find in the repository. The code can be downloaded as a zipped package from https://bitbucket.org/seralf/solrstarterbook. If you are familiar with Mercurial, you can download it directly as the source. We will use some of the scripts used to download the toy data for our indexing tests that are written using the Scala language. So, you can directly add the Scala library to the system CLASSPATH variable for you convenience, although it's not needed. We will discuss our scripts and example later in Chapter 3, Indexing Example Data from DBpedia – Paintings.

Pop quiz

Q1. Which of the following are the features of Solr?

  1. Full-text and faceted search

  2. Web crawling and site indexing

  3. Spellchecking and autosuggestion

Q2. From which of these options can we obtain a list of all the documents in the example?

  1. Using the query q=*:*

  2. Using the query q=documents:*

  3. Using the query q=*:all

Q3. Why does the standard Solr distribution include a working Jetty instance?

  1. Because Solr can't be run without Jetty

  2. Because we can't deploy the Solr war (web application) into other containers/application servers, such as Tomcat or Jboss

  3. Because Solr war needs to be run inside a web container, such as Jetty

Q4. What is cURL?

  1. cURL is a program used for parsing data from a remote URL, using the HTTP protocol

  2. cURL is a command line tool for transferring data with URL syntax, using the HTTP protocol

  3. cURL is a command line tool for sending queries to Solr, using the HTTP protocol

Q5. Which of the following statements are not true?

  1. Solr application exposes full-text and faceting search capabilities

  2. Solr application can be used for adding full-text search capabilities to a database systems

  3. Solr can be used as an embedded framework in Java application