Book Image

Apache Solr Beginner's Guide

By : Alfredo Serafini
Book Image

Apache Solr Beginner's Guide

By: Alfredo Serafini

Overview of this book

<p>With over 40 billion web pages, the importance of optimizing a search engine's performance is essential.<br /><br />Solr is an open source enterprise search platform from the Apache Lucene project. Full-text search, faceted search, hit highlighting, dynamic clustering, database integration, and rich document handling are just some of its many features. Solr is highly scalable thanks to its distributed search and index replication.<br /><br />Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable with most popular programming languages. Solr's powerful external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.<br /><br />With Apache Solr Beginner's Guide you will learn how to configure your own search engine experience. Using real data as an example, you will have the chance to start writing step-by-step, simple, real-world configurations and understand when and where to adopt this technology.<br /><br />Apache Solr Beginner's Guide will start by letting you explore a simple search over real data. You will then go through a step-by-step description that gives you the chance to explore several practical features. At the end of the book you will see how Solr is used in different real-world contexts.<br /><br />Using data from public domains like DBpedia, you will define several different configurations, exploring some of the most interesting Solr features, such as faceted search and navigation, auto-suggestion, and rich document indexing. You will see how to configure different analysers for handling different data types, without programming.<br /><br />You will learn the basics of Solr, focusing on real-world examples and practical configurations.</p>
Table of Contents (19 chapters)
Apache Solr Beginner's Guide
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Time for action – starting Solr for the first time


Ok, now it's time to start Solr for the first time.

  1. If we execute the following command from the terminal (from Windows, Linux, or Mac):

    >> cd %SOLR_DIST% (windows) 
    >> cd $SOLR_DIST (mac, linux)
    
  2. We change the directory to /example, and then we finally start Solr using the following command:

    >> java -jar start.jar
    
  3. We should obtain an output similar to the one seen in the following screenshot:

  4. You will quickly become familiar with the line highlighted in this output as it is easily recognizable (it ends in 0.0.0.0:8983). If we have not noticed any errors before it in the output, our Solr instance is running correctly. When Solr is running, you can leave the terminal window open and minimized in order to be able to see what happened when you need, in particular if there were errors as and when you need. This can be avoided on production systems where we will have scripts to automate start and stop Solr, but it's useful for our testing.

  5. When you wish to stop Solr, simply press the Ctrl + C combination in the terminal window.

What just happened?

The solr.jar launcher is a small Java library that starts an embedded Jetty container to run the Solr application. By default, this application will be running on port 8983. Jetty is a lightweight container that has been adopted for distributing Solr for its simplicity and small memory footprint. While Solr is distributed as a standard Java web application (you can find a solr.war under the /example/webapps folder), and then its WAR file can be deployed to any application server (such as Tomcat, JBoss, and others), the standard preferable way to use it is with the embedded Jetty instance. Then, we will start with the local Jetty instance bundled with Solr in order to let you familiarize yourself with the platform and its services, using a standard installation where you can also follow the tutorials on your own.

Tip

Note that in our example we need to change the current directory to /example, which is included in the folder that is unzipped from the standard distribution archive. The start.jar tool is designed to start the local jetty instance by accepting parameters for changing the Solr configurations. If it does not receive any particular option (as in our case), it searches the Solr configurations from the default examples. So, it needs to be started from that specific directory. In a similar way, the post.jar tool can be started from every directory containing the data to be sent to Solr.

If you want to change the default port value for Jetty (for example, if the default port results is occupied by other programs), you should look at the jetty.xml file in the [SOLR_DIST]/examples/etc directory where I wrote [SOLR_DIST] in place of the Windows, Mac, and Linux versions of the same environment variable. If you also want some control over the logging inside the terminal (sometimes it could become very annoying to find errors inside a huge quantity of lines running fast), please look for the logging.properties file in the same directory.

Taking a glance at the Solr interface

Now that the server is running, we are curious about how the Solr web application will look in our browser, so let's copy and paste this URL into the browser: http://localhost:8983/solr/#/. We will obtain the default home screen for the Solr web application, as shown in the following screenshot:

Note that since the default installation does not provide automatic redirection from the base root to the path seen before, a very common error is pointing to http://localhost:8983/ and obtaining the error shown in the following screenshot:

We can easily ignore this error for our purposes; so remember to check if you are using the correct address when you obtain this type of error.

We can execute our first query in the default admin screen on the default collection1 core: http://localhost:8983/solr/#/collection1/query. (In the next chapter, we will introduce the concept of core. So please be patient if there are things not well documented.)

We will obtain some XML results that clearly contain no data, as expected. In fact, we have not yet indexed any data.