Book Image

Elasticsearch 7.0 Cookbook - Fourth Edition

By : Alberto Paro
Book Image

Elasticsearch 7.0 Cookbook - Fourth Edition

By: Alberto Paro

Overview of this book

Elasticsearch is a Lucene-based distributed search server that allows users to index and search unstructured content with petabytes of data. With this book, you'll be guided through comprehensive recipes on what's new in Elasticsearch 7, and see how to create and run complex queries and analytics. Packed with recipes on performing index mapping, aggregation, and scripting using Elasticsearch, this fourth edition of Elasticsearch Cookbook will get you acquainted with numerous solutions and quick techniques for performing both every day and uncommon tasks such as deploying Elasticsearch nodes, integrating other tools to Elasticsearch, and creating different visualizations. You will install Kibana to monitor a cluster and also extend it using a variety of plugins. Finally, you will integrate your Java, Scala, Python, and big data applications such as Apache Spark and Pig with Elasticsearch, and create efficient data applications powered by enhanced functionalities and custom plugins. By the end of this book, you will have gained in-depth knowledge of implementing Elasticsearch architecture, and you'll be able to manage, search, and store data efficiently and effectively using Elasticsearch.
Table of Contents (23 chapters)
Title Page

Setting up Linux systems

If you are using a Linux system (generally in a production environment), you need to manage extra setup to improve performance or to resolve production problems with many indices.

This recipe covers the following two common errors that happen in production:

  • Too many open files that can corrupt your indices and your data
  • Slow performance in search and indexing due to the garbage collector
Big problems arise when you run out of disk space. In this scenario, some files can get corrupted. To prevent your indices from corruption and possible data, it is best to monitor the storage spaces. Default settings prevent index writing and block the cluster if your storage is over 80% full.

Getting ready

As we described in the Downloading and installing Elasticsearch recipe in this chapter, you need a working Elasticsearch installation and a simple text editor to change configuration files.

How to do it…

To improve the performance on Linux systems, we will perform the following steps:

  1. First, you need to change the current limit for the user that runs the Elasticsearch server. In these examples, we will call this elasticsearch.
  2. To allow Elasticsearch to manage a large number of files, you need to increment the number of file descriptors (number of files) that a user can manage. To do so, you must edit your /etc/security/limits.conf file and add the following lines at the end:
elasticsearch - nofile 65536
elasticsearch - memlock unlimited
  1. Then, a machine restart is required to be sure that the changes have been made.
  2. The new version of Ubuntu (that is, version 16.04 or later) can skip the /etc/security/limits.conf file in the init.d scripts. In these cases, you need to edit /etc/pam.d/ and remove the following comment line:
# session required pam_limits.so
  1. To control memory swapping, you need to set up the following parameter in elasticsearch.yml:
bootstrap.memory_lock
  1. To fix the memory usage size of the Elasticsearch server, we need to set up the same values for Xmsand Xmx in $ES_HOME/config/jvm.options (that is, we set 1 GB of memory in this case), as follows:
-Xms1g
-Xmx1g

How it works…

The standard limit of file descriptors (https://www.bottomupcs.com/file_descriptors.xhtml ) (maximum number of open files for a user) is typically 1,024 or 8,096. When you store a lot of records in several indices, you run out of file descriptors very quickly, so your Elasticsearch server becomes unresponsive and your indices may become corrupted, causing you to lose your data.

Changing the limit to a very high number means that your Elasticsearch doesn't hit the maximum number of open files.

The other setting for memory prevents Elasticsearch from swapping memory and give a performance boost in a environment. This setting is required because, during indexing and searching, Elasticsearch creates and destroys a lot of objects in memory. This large number of create/destroy actions fragments the memory and reduces performance. The memory then becomes full of holes and, when the system needs to allocate more memory, it suffers an overhead to find compacted memory. If you don't set bootstrap.memory_lock: true, Elasticsearch dumps the whole process memory on disk and defragments it back in memory, freezing the system. With this setting, the defragmentation step is done all in memory, with a huge performance boost.