Book Image

Monitoring Elasticsearch

By : Dan Noble, Pulkit Agrawal, Mahmoud Lababidi
Book Image

Monitoring Elasticsearch

By: Dan Noble, Pulkit Agrawal, Mahmoud Lababidi

Overview of this book

ElasticSearch is a distributed search server similar to Apache Solr with a focus on large datasets, a schema-less setup, and high availability. This schema-free architecture allows ElasticSearch to index and search unstructured content, making it perfectly suited for both small projects and large big data warehouses with petabytes of unstructured data. This book is your toolkit to teach you how to keep your cluster in good health, and show you how to diagnose and treat unexpected issues along the way. You will start by getting introduced to ElasticSearch, and look at some common performance issues that pop up when using the system. You will then see how to install and configure ElasticSearch and the ElasticSearch monitoring plugins. Then, you will proceed to install and use the Marvel dashboard to monitor ElasticSearch. You will find out how to troubleshoot some of the common performance and reliability issues that come up when using ElasticSearch. Finally, you will analyze your cluster’s historical performance, and get to know how to get to the bottom of and recover from system failures. This book will guide you through several monitoring tools, and utilizes real-world cases and dilemmas faced when using ElasticSearch, showing you how to solve them simply, quickly, and cleanly.
Table of Contents (15 chapters)
Monitoring Elasticsearch
About the Author
About the Reviewers

Improving query performance

This section highlights common reasons behind certain slow queries on Elasticsearch, and offers instruction to improve performance.

High-cardinality fields

As previously mentioned, running aggregation or sorts against high-cardinality fields (for example, dates precise to the millisecond) can fill up the fielddata cache which leads to OutOfMemoryError exceptions. However, even without these errors, running aggregations and sorts can be detrimental to performance. When it comes to dates, it's generally a good idea to store and use less precise dates in order to speed up query execution time.

Querying smaller indices

As Elasticsearch indices grow larger, query performance will suffer. Another way to improve performance is to run queries against small indices. You can do this by storing our data in several smaller indices instead of one large one.

For example, with Twitter data, you can change the ingestion process to create a new index every day to store tweets. This...