Book Image

Mastering Elasticsearch 5.x - Third Edition

By : Bharvi Dixit
Book Image

Mastering Elasticsearch 5.x - Third Edition

By: Bharvi Dixit

Overview of this book

Elasticsearch is a modern, fast, distributed, scalable, fault tolerant, and open source search and analytics engine. Elasticsearch leverages the capabilities of Apache Lucene, and provides a new level of control over how you can index and search even huge sets of data. This book will give you a brief recap of the basics and also introduce you to the new features of Elasticsearch 5. We will guide you through the intermediate and advanced functionalities of Elasticsearch, such as querying, indexing, searching, and modifying data. We’ll also explore advanced concepts, including aggregation, index control, sharding, replication, and clustering. We’ll show you the modules of monitoring and administration available in Elasticsearch, and will also cover backup and recovery. You will get an understanding of how you can scale your Elasticsearch cluster to contextualize it and improve its performance. We’ll also show you how you can create your own analysis plugin in Elasticsearch. By the end of the book, you will have all the knowledge necessary to master Elasticsearch and put it to efficient use.
Table of Contents (20 chapters)
Mastering Elasticsearch 5.x - Third Edition
About the Author
About the Reviewer
Customer Feedback

Index versus type - a revised approach for creating indices

In the beginning of this chapter, we talked about strategies for choosing the right amount of shards and replicas for indices in Choosing the right amount of shards and replicas. Now, we will bring another factor; document types which can be taken into account while creating indices with a greater or fewer number of shards.

Creating too many indices or creating too many shards is always very resource demanding, since in the end every index or shard is internally a Lucene index, which has particular overhead of memory usage, file descriptors, and other resources needed. With a larger number of shards or indices, the other overhead comes at the time of search. More shards means search is executed on more shards and Elasticsearch has to combine the response returned from all the shards and merge them before sending the response back to the client. This becomes an expensive process both in terms of aggregations and normal search requests...