Book Image

Apache Solr Beginner's Guide

By : Alfredo Serafini
Book Image

Apache Solr Beginner's Guide

By: Alfredo Serafini

Overview of this book

<p>With over 40 billion web pages, the importance of optimizing a search engine's performance is essential.<br /><br />Solr is an open source enterprise search platform from the Apache Lucene project. Full-text search, faceted search, hit highlighting, dynamic clustering, database integration, and rich document handling are just some of its many features. Solr is highly scalable thanks to its distributed search and index replication.<br /><br />Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable with most popular programming languages. Solr's powerful external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.<br /><br />With Apache Solr Beginner's Guide you will learn how to configure your own search engine experience. Using real data as an example, you will have the chance to start writing step-by-step, simple, real-world configurations and understand when and where to adopt this technology.<br /><br />Apache Solr Beginner's Guide will start by letting you explore a simple search over real data. You will then go through a step-by-step description that gives you the chance to explore several practical features. At the end of the book you will see how Solr is used in different real-world contexts.<br /><br />Using data from public domains like DBpedia, you will define several different configurations, exploring some of the most interesting Solr features, such as faceted search and navigation, auto-suggestion, and rich document indexing. You will see how to configure different analysers for handling different data types, without programming.<br /><br />You will learn the basics of Solr, focusing on real-world examples and practical configurations.</p>
Table of Contents (19 chapters)
Apache Solr Beginner's Guide
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Installing and testing Solr


Once Java is correctly installed, it's time to install Solr and make some initial tests. To simplify things, we will adopt the default distribution that you can download from the official page: http://lucene.apche.org/solr/ (the current version at the time of writing is Version 4.5). The zipped package can be extracted and copied to a folder of choice.

Once extracted, the Solr standard distribution will contain the folders shown in the following screenshot:

We will start Solr from here; even if we don't need to use all the libraries and examples obtained with the distribution, you can continue exploring the folders with your own examples after reading this book. Some of the folders are as follows:

  • /solr: This represents a simple single core configuration

  • /multicore: This represents a multiple core (multicore) configuration example

  • /example-DIH: This provides examples for the data import handler capabilities

  • /exampledocs: This contains some toy data to play with

For the moment, we will ignore the folders external to /. These folders will be useful later when we will need to use third-party libraries.

The simplest way to run the Solr instance will be by using the solr.jar launcher, which we can find in the /example folder. For our convenience, it's useful to define a new environment variable SOLR_DIST that will point to the absolute path: /the-path-of-solr-distribution/example. In order to use the example, in the most simplest way, I suggest you to put the unzipped Solr distribution at the location /SolrStarterBook/solr, where SolrStarterBook is the folder where you have the complete code examples for this book. We can easily create this new environment variable in the same way we created the CLASSPATH one.