Book Image

ElasticSearch Server

Book Image

ElasticSearch Server

Overview of this book

ElasticSearch is an open source search server built on Apache Lucene. It was built to provide a scalable search solution with built-in support for near real-time search and multi-tenancy.Jumping into the world of ElasticSearch by setting up your own custom cluster, this book will show you how to create a fast, scalable, and flexible search solution. By learning the ins-and-outs of data indexing and analysis, "ElasticSearch Server" will start you on your journey to mastering the powerful capabilities of ElasticSearch. With practical chapters covering how to search data, extend your search, and go deep into cluster administration and search analysis, this book is perfect for those new and experienced with search servers.In "ElasticSearch Server" you will learn how to revolutionize your website or application with faster, more accurate, and flexible search functionality. Starting with chapters on setting up your own ElasticSearch cluster and searching and extending your search parameters you will quickly be able to create a fast, scalable, and completely custom search solution.Building on your knowledge further you will learn about ElasticSearch's query API and become confident using powerful filtering and faceting capabilities. You will develop practical knowledge on how to make use of ElasticSearch's near real-time capabilities and support for multi-tenancy.Your journey then concludes with chapters that help you monitor and tune your ElasticSearch cluster as well as advanced topics such as shard allocation, gateway configuration, and the discovery module.
Table of Contents (17 chapters)
ElasticSearch Server
Credits
About the Authors
Acknowledgement
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

Index aliasing and simplifying your everyday work using it


When working with multiple indexes in ElasticSearch, you can sometimes lose track of them. Imagine a situation where you store logs in your indexes. Usually, the number of log messages is quite large; therefore, it is a good solution to have the data divided somehow. A quite logical division of such data is obtained by creating a single index for a single day of logs (if you are interested in an open source solution for managing logs, look at Logstash—http://logstash.net). After a while, if we keep all the indexes, we start having problems in understanding which are the newest indexes, which ones should be used, which ones are from the last month, and maybe which data belongs to which client. With the help of aliases, we can change that to work with a single name, just as we would use a single index, but instead work with multiple indexes.

An alias

What is an index alias? It's an additional name for one or more indexes that allow(s) us to query indexes with the use of that name. A single alias can have multiple indexes as well as the other way around, a single index can be a part of multiple aliases.

However, please remember that you can't use an alias that has multiple indexes for indexing or real-time GET operations—ElasticSearch will throw an exception if you do that. We can still use an alias that links to only one single index for indexing though. This is because ElasticSearch doesn't know in which index the data should be indexed, or from which index the document should be fetched.

Creating an alias

To create an index alias, we need to run an HTTP POST method to the _aliases REST endpoint with an action defined. For example, the following request will create a new alias called week12 that will have indexes named day10, day11, and day12:

curl -XPOST 'http://localhost:9200/_aliases' -d '{
  "actions" : [
    { "add" : { "index" : "day10", "alias" : "week12" } },
    { "add" : { "index" : "day11", "alias" : "week12" } },
    { "add" : { "index" : "day12", "alias" : "week12" } }
  ]
}'

If the alias week12 isn't present in our ElasticSearch cluster, the preceding command will create it. If it is present, the command will just add the specified indexes to it.

If everything goes well, instead of running a search across three indexes as follows:

curl –XGET 'http://localhost:9200/day10,day11,day12/_search?q=test'

We can run it as follows:

curl –XGET 'http://localhost:9200/week12/_search?q=test'

Isn't that better?

Modifying aliases

Of course, you can also remove indexes from an alias. Doing that is similar to how we add indexes to an alias, but instead of the add command, we use the remove one. For example, to remove the index named day9 from the week12 index, we would run the following command:

curl -XPOST 'http://localhost:9200/_aliases' -d '{
 "actions" : [
    { "remove" : { "index" : "day9", "alias" : "week12" } }
  ]
}'

Combining commands

The add and remove commands can be sent as a single request. For example, if you want to combine all the previously sent commands into a single request, you will have to send the following command:

curl -XPOST 'http://localhost:9200/_aliases' -d '{
  "actions" : [
    { "add" : { "index" : "day10", "alias" : "week12" } },
    { "add" : { "index" : "day11", "alias" : "week12" } },
    { "add" : { "index" : "day12", "alias" : "week12" } },
    { "remove" : { "index" : "day9", "alias" : "week12" } }
  ]
}'

Retrieving all aliases

In addition to adding or removing indexes to or from aliases, the applications that use ElasticSearch may need to retrieve all the aliases available in the cluster or all the aliases an index is connected to. To retrieve these aliases, we send a request using an HTTP GET command. For example, the following command gets all the aliases for the day10 index and the second one will get all the available aliases:

curl -XGET 'localhost:9200/day10/_aliases'
curl -XGET 'localhost:9200/_aliases'

The response from the second command is as follows:

{
  "day10" : {
    "aliases" : {
      "week12" : { }
    }
  },
  "day11" : {
    "aliases" : {
      "week12" : { }
    }
  },
  "day12" : {
    "aliases" : {
      "week12" : { }
    }
  }
}

Filtering aliases

Aliases can be used in a similar way to how views are used in SQL databases. You can use full Query DSL (discussed in detail in the Queying ElasticSearch section in the next chapter) and have your query applied to all the count, search, delete by query, and more such operations. Let's look at an example. Imagine that we want to have aliases that return data for a certain client, so we can use it in our application. Let's say that the client identifier we are interested in is stored in the clientId field and we are interested in client 12345. So, let's create an alias named client with our data index, which will apply a filter for the clientId automatically:

curl -XPOST 'http://localhost:9200/_aliases' -d '{
  "actions" : [
  {
    "add" : {
      "index" : "data",
      "alias" : "client",
     "filter" : { "term" : { "clientId" : "12345" } }
    }
  } ]
}'

So, when using the preceding alias, you will always get your queries, counts, deletes by query, and more such queries filtered by a term query that ensures that all the documents have the 12345 value in the clientId field.

Aliases and routing

Similar to the aliases that use filtering, we can add routing values to the aliases. Imagine that we are using routing on the basis of user identifier and we want to use the same routing values with our aliases. For the alias named client, we will use the routing value of 12345,12346,12347 for indexing, and only 12345 for querying. So, we create an alias with the following command:

curl -XPOST 'http://localhost:9200/_aliases' -d '{
  "actions" : [
  {
    "add" : {
      "index" : "data",
      "alias" : "client",
      "index_routing" : "12345,12346,12347"
      "search_routing" : "12345"
    }
  } ]
}'

This way, when we index our data by using the client alias, the values specified by the index_routing property will be used, and during query time, the one specified by the search_routing property will be used.

If you run the following query with the preceding alias:

curl -XGET 'http://localhost:9200/client/_search?q=test&routing=99999,12345'

The value used as a routing value will be 12345. This is because ElasticSearch will take the common values of the search_routing attribute and the query routing parameter, which in our case is 12345.