Mastering Elasticsearch - Second Edition

Welcome to the world of Elasticsearch and Mastering Elasticsearch Second Edition. While reading the book, you'll be taken through different topics—all connected to Elasticsearch. Please remember though that this book is not meant for beginners and we really treat the book as a follow-up or second part of Elasticsearch Server Second Edition. There is a lot of new content in the book and, sometimes, you can refer to the content of Elasticsearch Server Second Edition within this book.

Throughout the book, we will discuss different topics related to Elasticsearch and Lucene. We start with an introduction to the world of Lucene and Elasticsearch to introduce you to the world of queries provided by Elasticsearch, where we discuss different topics related to queries, such as filtering and which query to choose in a particular situation. Of course, querying is not all and, because of that, the book you are holding in your hands provides information on newly introduced aggregations and features that will help you give meaning to the data you have indexed in Elasticsearch indices, and provide a better search experience for your users.

Even though, for most users, querying and data analysis are the most interesting parts of Elasticsearch, they are not all that we need to discuss. Because of this, the book tries to bring you additional information when it comes to index architecture such as choosing the right number of shards and replicas, adjusting the shard allocation behavior, and so on. We will also get into the places where Elasticsearch meets Lucene, and we will discuss topics such as different scoring algorithms, choosing the right store mechanism, what the differences between them are, and why choosing the proper one matters.

Last, but not least, we touch on the administration part of Elasticsearch by discussing discovery and recovery modules, and the human-friendly Cat API, which allows us to very quickly get relevant administrative information in a form that most humans should be able to read without parsing JSON responses. We also talk about and use tribe nodes, giving us possibilities of creating federated searches across many nodes.

Because of the title of the book, we couldn't omit performance-related topics, and we decided to dedicate a whole chapter to it. We talk about doc values and the improvements they bring, how garbage collector works, and what to do when it does not work as we expect. Finally, we talk about Elasticsearch scaling and how to prepare it for high indexing and querying use cases.

Just as with the first edition of the book, we decided to end the book with the development of Elasticsearch plugins, showing you how to set up the Apache Maven project and develop two types of plugins—custom REST action and custom analysis.

If you think that you are interested in these topics after reading about them, we think this is a book for you and, hopefully, you will like the book after reading the last words of the summary in Chapter 9, Developing Elasticsearch Plugins.

What this book covers

Chapter 1, Introduction to Elasticsearch, guides you through how Apache Lucene works and will reintroduce you to the world of Elasticsearch, describing the basic concepts and showing you how Elasticsearch works internally.

Chapter 2, Power User Query DSL, describes how the Apache Lucene scoring works, why Elasticsearch rewrites queries, what query templates are, and how we can use them. In addition to that, it explains the usage of filters and which query should be used in a particular use case.

Chapter 3, Not Only Full Text Search, describes queries rescoring, multimatching control, and different types of aggregations that will help you with data analysis—significant terms aggregation and top terms aggregation that allow us to group documents with a certain criteria. In addition to that, it discusses relationship handling in Elasticsearch and extends your knowledge about scripting in Elasticsearch.

Chapter 4, Improving the User Search Experience, covers user search experience improvements. It introduces you to the world of Suggesters, which allows you to correct user query spelling mistakes and build efficient autocomplete mechanisms. In addition to that, you'll see how to improve query relevance by using different queries and the Elasticsearch functionality with a real-life example.

Chapter 5, The Index Distribution Architecture, covers techniques for choosing the right amount of shards and replicas, how routing works, how shard allocation works, and how to alter its behavior. In addition to that, we discuss what query execution preference is and how it allows us to choose where the queries are going to be executed.

Chapter 6, Low-level Index Control, describes how to alter the Apache Lucene scoring and how to choose an alternative scoring algorithm. It also covers NRT searching and indexing and transaction log usage, and allows you to understand segment merging and tune it for your use case. At the end of the chapter, you will also find information about Elasticsearch caching and request breakers aiming to prevent out-of-memory situations.

Chapter 7, Elasticsearch Administration, describes what the discovery, gateway, and recovery modules are, how to configure them, and why you should bother. We also describe what the Cat API is, how to back up and restore your data to different cloud services (such as Amazon AWS or Microsoft Azure), and how to use tribe nodes—Elasticsearch federated search.

Chapter 8, Improving Performance, covers Elasticsearch performance-related topics ranging from using doc values to help with field data cache memory usage through the JVM garbage collector work, and queries benchmarking to scaling Elasticsearch and preparing it for high indexing and querying scenarios.

Chapter 9, Developing Elasticsearch Plugins, covers Elasticsearch plugins' development by showing and describing in depth how to write your own REST action and language analysis plugin.

What you need for this book

This book was written using Elasticsearch server 1.4.x, and all the examples and functions should work with it. In addition to that, you'll need a command that allows you to send HTTP requests such as curl, which is available for most operating systems. Please note that all examples in this book use the mentioned curl tool. If you want to use another tool, please remember to format the request in an appropriate way that is understood by the tool of your choice.

In addition to that, to run examples in Chapter 9, Developing Elasticsearch Plugins, you will need a Java Development Kit (JDK) installed and an editor that will allow you to develop your code (or Java IDE-like Eclipse). To build the code and manage dependencies in Chapter 9, Developing Elasticsearch Plugins, we are using Apache Maven.

Who this book is for

This book was written for Elasticsearch users and enthusiasts who are already familiar with the basic concepts of this great search server and want to extend their knowledge when it comes to Elasticsearch itself, as well as topics such as how Apache Lucene or the JVM garbage collector works. In addition to that, readers who want to see how to improve their query relevancy and learn how to extend Elasticsearch with their own plugin may find this book interesting and useful.

If you are new to Elasticsearch and you are not familiar with basic concepts such as querying and data indexing, you may find it difficult to use this book, as most of the chapters assume that you have this knowledge already. In such cases, we suggest that you look at our previous book about Elasticsearch— Elasticsearch Server Second Edition, Packt Publishing.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "We can include other contexts through the use of the include directive."

A block of code is set as follows:

curl -XGET 'localhost:9200/clients/_search?pretty' -d '{
 "query" : {
  "prefix" : {
   "name" : {
    "prefix" : "j",
    "rewrite" : "constant_score_boolean"
   }
  }
 }
}'

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

curl -XGET 'localhost:9200/clients/_search?pretty' -d '{
 "query" : {
  "prefix" : {
   "name" : {
    "prefix" : "j",
    "rewrite" : "constant_score_boolean"
   }
  }
 }
}'

Any command-line input or output is written as follows:

curl -XPOST 'localhost:9200/scoring/doc/1' -d '{"name":"first document"}'

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "clicking the Next button moves you to the next screen".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.

Mastering Elasticsearch - Second Edition

Mastering Elasticsearch - Second Edition

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Elasticsearch - Second Edition

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions