Book Image

Apache Solr Beginner's Guide

By : Alfredo Serafini
Book Image

Apache Solr Beginner's Guide

By: Alfredo Serafini

Overview of this book

<p>With over 40 billion web pages, the importance of optimizing a search engine's performance is essential.<br /><br />Solr is an open source enterprise search platform from the Apache Lucene project. Full-text search, faceted search, hit highlighting, dynamic clustering, database integration, and rich document handling are just some of its many features. Solr is highly scalable thanks to its distributed search and index replication.<br /><br />Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable with most popular programming languages. Solr's powerful external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.<br /><br />With Apache Solr Beginner's Guide you will learn how to configure your own search engine experience. Using real data as an example, you will have the chance to start writing step-by-step, simple, real-world configurations and understand when and where to adopt this technology.<br /><br />Apache Solr Beginner's Guide will start by letting you explore a simple search over real data. You will then go through a step-by-step description that gives you the chance to explore several practical features. At the end of the book you will see how Solr is used in different real-world contexts.<br /><br />Using data from public domains like DBpedia, you will define several different configurations, exploring some of the most interesting Solr features, such as faceted search and navigation, auto-suggestion, and rich document indexing. You will see how to configure different analysers for handling different data types, without programming.<br /><br />You will learn the basics of Solr, focusing on real-world examples and practical configurations.</p>
Table of Contents (19 chapters)
Apache Solr Beginner's Guide
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Time for action – testing Solr with cURL


If you look at the top of the previous screenshot containing results, you would recognize the address http://localhost:8983/solr/collection1/query?q=*:*&wt=json&indent=true. It represents a specific query with its parameters. You can copy this address and paste it directly into the browser to obtain the same results as seen before, without necessarily passing it from the web interface. Note that the browser will encode some character when sending the query via HTTP. For example, the character : will be encoded as %3A. This will be one of our methods for directly testing queries. But while the browser can be more comfortable in many cases, a command-line approach is surprisingly clearer on many others; and I want to you to be familiar with both ones.

This can be easily done by running the same query on the browser interface and also with the cURL tool. You will become familiar with this process after executing it a few times, and it's useful to focus on how the data are actually transferred over HTTP, giving us the best start to understanding how we can write a direct access to the HTTP services. This will be useful for writing frontends with JavaScript or other languages.

Tip

You can download the latest cURL version for your platform/architecture from here: http://curl.haxx.se/download.html.

Please remember that it is better for Linux systems to use the package manager of your distribution (yum, apt, and similar ones). For Windows users, it's important to add the cURL executable into the environment variable PATH as we have done previously for Java. This is done in order to have it usable from the command line, without having to prepend the absolute path every time.

We can execute the following query with cURL on the command line in the same way we ran it before:

>> curl -X  GET "http://localhost:8983/solr/collection1/query?q=*:*&wt=json&indent=true"

Next chapter onwards, we will use the browser and cURL interchangeably; adopting from time to time the clearest method for each specific case.

What just happened?

When cURL is configured, the result of the query will be the same seen in the browser. We generally use cURL by putting the HTTP request address containing its parameters in double quotes; and we will explicitly adopt the -X GET parameter to make the requests more clear: saving in some .txt files the cURL requests made permits us, for example, to fully reconstruct the exact queries sent. We can also send POST queries with cURL, and this is very useful to perform indexing and administrative tasks (for example, a delete action) from the command line.