Book Image

Apache Solr Beginner's Guide

By : Alfredo Serafini
Book Image

Apache Solr Beginner's Guide

By: Alfredo Serafini

Overview of this book

<p>With over 40 billion web pages, the importance of optimizing a search engine's performance is essential.<br /><br />Solr is an open source enterprise search platform from the Apache Lucene project. Full-text search, faceted search, hit highlighting, dynamic clustering, database integration, and rich document handling are just some of its many features. Solr is highly scalable thanks to its distributed search and index replication.<br /><br />Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable with most popular programming languages. Solr's powerful external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.<br /><br />With Apache Solr Beginner's Guide you will learn how to configure your own search engine experience. Using real data as an example, you will have the chance to start writing step-by-step, simple, real-world configurations and understand when and where to adopt this technology.<br /><br />Apache Solr Beginner's Guide will start by letting you explore a simple search over real data. You will then go through a step-by-step description that gives you the chance to explore several practical features. At the end of the book you will see how Solr is used in different real-world contexts.<br /><br />Using data from public domains like DBpedia, you will define several different configurations, exploring some of the most interesting Solr features, such as faceted search and navigation, auto-suggestion, and rich document indexing. You will see how to configure different analysers for handling different data types, without programming.<br /><br />You will learn the basics of Solr, focusing on real-world examples and practical configurations.</p>
Table of Contents (19 chapters)
Apache Solr Beginner's Guide
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Time for action – posting some example data


Now that we have prepared the system and installed Solr, we are ready to post some example data as suggested by the default tutorial. In order to check if our installation is working as expected, we need to perform the following steps:

  1. We can easily post some of the example data contained in the /example/exampledocs folder of our Solr installation. First of all, we move to that directory using the following command:

    >> cd %SOLR_DIST% (windows)
    >> cd $SOLR_DIST (linux, mac)
    
  2. Then we will index some data using the post.jar library provided, using the following command:

    >> java -jar post.jar .
    
  3. In the /example/exampledocs subfolder, you can find some documents written using the XML, CSV, or JSON format that Solr recognizes to index the data. The post.jar Java library is designed to send every file contained in a directory (in this case, the current directory). This library is written in one of these formats to a running Solr instance, in this case the default one. The data is sent by an HTTP POST request, and this should explain the name.

  4. Once the example data is indexed, we can again run a query with simple parameters, as shown in the following screenshot:

  5. Here, we are able to see some results exposed by default using the json format. The example data describes items in a hypothetical catalog of electronic devices.

What just happened?

As you can see in the screenshot, the results are recognizable as items inside a docs collection; we can see the first, which has both fields containing a single value or multiple values (these are easily recognizable by the [ , ] JSON syntax for lists). The header section of the results contains some general information. For example, the query sent (q=*:*, which basically means "I want to obtain all the documents") and the format chosen for the output (in our case JSON). Moreover, you should note that the number of results is 32, which is bigger than the number of files in that directory. This should suggest to us that we send more than one single document in a single post (we will see this in the later chapters).

Lastly, you can see in the address that we are actually querying over a subpath called collection1. This is the name of the default collection where we have indexed our example data. In the next chapter, we will start using our first collection instead of this example one.