ElasticSearch Server

ElasticSearch Server

Overview of this book

ElasticSearch is an open source search server built on Apache Lucene. It was built to provide a scalable search solution with built-in support for near real-time search and multi-tenancy.Jumping into the world of ElasticSearch by setting up your own custom cluster, this book will show you how to create a fast, scalable, and flexible search solution. By learning the ins-and-outs of data indexing and analysis, "ElasticSearch Server" will start you on your journey to mastering the powerful capabilities of ElasticSearch. With practical chapters covering how to search data, extend your search, and go deep into cluster administration and search analysis, this book is perfect for those new and experienced with search servers.In "ElasticSearch Server" you will learn how to revolutionize your website or application with faster, more accurate, and flexible search functionality. Starting with chapters on setting up your own ElasticSearch cluster and searching and extending your search parameters you will quickly be able to create a fast, scalable, and completely custom search solution.Building on your knowledge further you will learn about ElasticSearch's query API and become confident using powerful filtering and faceting capabilities. You will develop practical knowledge on how to make use of ElasticSearch's near real-time capabilities and support for multi-tenancy.Your journey then concludes with chapters that help you monitor and tune your ElasticSearch cluster as well as advanced topics such as shard allocation, gateway configuration, and the discovery module.

ElasticSearch Server

Credits

About the Authors

Acknowledgement

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Getting Started with ElasticSearch Cluster

What is ElasticSearch?

Installing and configuring your cluster

Directory structure

Configuring ElasticSearch

Running ElasticSearch

Shutting down ElasticSearch

Running ElasticSearch as a system service

Data manipulation with REST API

Manual index creation and mappings configuration

Dynamic mappings and templates

When routing does matter

Index aliasing and simplifying your everyday work using it

Summary

Searching Your Data

Understanding the querying and indexing process

Mappings

Querying ElasticSearch

Basic queries

Filtering your results

Compound queries

Sorting data

Using scripts

Summary

Extending Your Structure and Search

Indexing data that is not flat

Extending your index structure with additional internal information

Highlighting

Autocomplete

Handling files

Geo

Summary

Make Your Search Better

Why this document was found

Influencing scores with query boosts

When does index-time boosting make sense

The words having the same meaning

Searching content in different languages

Using span queries

Summary

Combining Indexing, Analysis, and Search

Indexing tree-like structures

Modifying your index structure with the update API

Using nested objects

Using parent-child relationships

Fetching data from other systems: river

Batch indexing to speed up your indexing process

Summary

Beyond Searching

Faceting

Running ElasticSearch

Let's run our first instance. Go to the bin directory and run the following command from the command line:

./elasticsearch –f (Linux or OS X)
elasticsearch.bat –f (Windows)

The -f option tells ElasticSearch that the program should not be detached from the console and should be run in the foreground. This allows us to see the diagnostic messages generated by the program and stop it by pressing Ctrl + C. The other option is -p, which tells ElasticSearch that the identifier of the process should be written to the file pointed by this parameter. This can be executed by using additional monitoring software or admin scripts.

Congratulations, we now have our ElasticSearch instance up and running! During its work, a server usually uses two port numbers: one for communication with the REST API by using the HTTP protocol and the second one for the transport module used for communication in a cluster. The default port for the HTTP API is 9200, so we can check the search readiness by pointing a web browser at http://127.0.0.1:9200/. The browser should show a code snippet similar to the following:

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

{
  "ok" : true,
  "status" : 200,
  "name" : "Donald Pierce",
  "version" : {
    "number" : "0.20.0"
  },
  "tagline" : "You Know, for Search"
}

The output is structured as a JSON (JavaScript Object Notation ) object. We will use this notation in more complex requests too. If you are not familiar with JSON, please take a minute and read the article available at http://en.wikipedia.org/wiki/JSON.

Note

Note that ElasticSearch is smart. If the default port is not available, the engine binds to the next free port. You can find information about this on the console, during booting:

[2012-09-02 22:45:17,101][INFO ][http] [Red Lotus] bound_address {inet[/0:0:0:0:0:0:0:0%0:9200]}, publish_address {inet[/192.168.1.101:9200]}

Note the fragment with [http]. ElasticSearch uses a few ports for various tasks. The interface that we are using is handled by the HTTP module.

Now we will use the cURL program. For example, our query can be executed as follows:

curl –XGET http://127.0.0.1:9200/_cluster/health?pretty

The -X parameter is a request method. The default value is GET (so, in this example, we can omit this parameter). Do not worry about the GET value for now, we will describe it in more detail later in this chapter.

Note the ?pretty parameter. As a standard, the API returns information in a JSON object in which the new line signs are omitted. This parameter forces ElasticSearch to add a new line character to the response, making the response more human-friendly. You can try running the preceding query with and without the ?pretty parameter to see the difference.

ElasticSearch is useful in small and medium-sized applications, but it is built with large installations in mind. So now we will set up our big, two-node cluster. Unpack the ElasticSearch archive in a different directory and run the second instance. If we look into the log, we see something similar to the following:

 [2012-09-09 11:23:05,604][INFO ][cluster.service          ] [Orbit] detected_master [Bova][fo2dHTS3TlWKlJiDnQOKAg][inet[/192.168.1.101:9300]], added {[Bova][fo2dHTS3TlWKlJiDnQOKAg][inet[/192.168.1.101:9300]],}, reason: zen-disco-receive(from master [[Bova][fo2dHTS3TlWKlJiDnQOKAg][inet[/192.168.1.101:9300]]])

This means that our second instance (named Orbit) found the previously running instance (named Bova). ElasticSearch automatically formed a new, two-node cluster.

ElasticSearch Server

ElasticSearch Server

Overview of this book

Related Content you might be interested in

Current Title:

ElasticSearch Server

Running ElasticSearch

Tip

Note