Book Image

ElasticSearch Server

Book Image

ElasticSearch Server

Overview of this book

ElasticSearch is an open source search server built on Apache Lucene. It was built to provide a scalable search solution with built-in support for near real-time search and multi-tenancy.Jumping into the world of ElasticSearch by setting up your own custom cluster, this book will show you how to create a fast, scalable, and flexible search solution. By learning the ins-and-outs of data indexing and analysis, "ElasticSearch Server" will start you on your journey to mastering the powerful capabilities of ElasticSearch. With practical chapters covering how to search data, extend your search, and go deep into cluster administration and search analysis, this book is perfect for those new and experienced with search servers.In "ElasticSearch Server" you will learn how to revolutionize your website or application with faster, more accurate, and flexible search functionality. Starting with chapters on setting up your own ElasticSearch cluster and searching and extending your search parameters you will quickly be able to create a fast, scalable, and completely custom search solution.Building on your knowledge further you will learn about ElasticSearch's query API and become confident using powerful filtering and faceting capabilities. You will develop practical knowledge on how to make use of ElasticSearch's near real-time capabilities and support for multi-tenancy.Your journey then concludes with chapters that help you monitor and tune your ElasticSearch cluster as well as advanced topics such as shard allocation, gateway configuration, and the discovery module.
Table of Contents (17 chapters)
ElasticSearch Server
Credits
About the Authors
Acknowledgement
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

Running ElasticSearch


Let's run our first instance. Go to the bin directory and run the following command from the command line:

./elasticsearch –f (Linux or OS X)
elasticsearch.bat –f (Windows)

The -f option tells ElasticSearch that the program should not be detached from the console and should be run in the foreground. This allows us to see the diagnostic messages generated by the program and stop it by pressing Ctrl + C. The other option is -p, which tells ElasticSearch that the identifier of the process should be written to the file pointed by this parameter. This can be executed by using additional monitoring software or admin scripts.

Congratulations, we now have our ElasticSearch instance up and running! During its work, a server usually uses two port numbers: one for communication with the REST API by using the HTTP protocol and the second one for the transport module used for communication in a cluster. The default port for the HTTP API is 9200, so we can check the search readiness by pointing a web browser at http://127.0.0.1:9200/. The browser should show a code snippet similar to the following:

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

{
  "ok" : true,
  "status" : 200,
  "name" : "Donald Pierce",
  "version" : {
    "number" : "0.20.0"
  },
  "tagline" : "You Know, for Search"
}

The output is structured as a JSON (JavaScript Object Notation ) object. We will use this notation in more complex requests too. If you are not familiar with JSON, please take a minute and read the article available at http://en.wikipedia.org/wiki/JSON.

Note

Note that ElasticSearch is smart. If the default port is not available, the engine binds to the next free port. You can find information about this on the console, during booting:

[2012-09-02 22:45:17,101][INFO ][http] [Red Lotus] bound_address {inet[/0:0:0:0:0:0:0:0%0:9200]}, publish_address {inet[/192.168.1.101:9200]}

Note the fragment with [http]. ElasticSearch uses a few ports for various tasks. The interface that we are using is handled by the HTTP module.

Now we will use the cURL program. For example, our query can be executed as follows:

curl –XGET http://127.0.0.1:9200/_cluster/health?pretty

The -X parameter is a request method. The default value is GET (so, in this example, we can omit this parameter). Do not worry about the GET value for now, we will describe it in more detail later in this chapter.

Note the ?pretty parameter. As a standard, the API returns information in a JSON object in which the new line signs are omitted. This parameter forces ElasticSearch to add a new line character to the response, making the response more human-friendly. You can try running the preceding query with and without the ?pretty parameter to see the difference.

ElasticSearch is useful in small and medium-sized applications, but it is built with large installations in mind. So now we will set up our big, two-node cluster. Unpack the ElasticSearch archive in a different directory and run the second instance. If we look into the log, we see something similar to the following:

 [2012-09-09 11:23:05,604][INFO ][cluster.service          ] [Orbit] detected_master [Bova][fo2dHTS3TlWKlJiDnQOKAg][inet[/192.168.1.101:9300]], added {[Bova][fo2dHTS3TlWKlJiDnQOKAg][inet[/192.168.1.101:9300]],}, reason: zen-disco-receive(from master [[Bova][fo2dHTS3TlWKlJiDnQOKAg][inet[/192.168.1.101:9300]]])

This means that our second instance (named Orbit) found the previously running instance (named Bova). ElasticSearch automatically formed a new, two-node cluster.