Book Image

Elasticsearch 7.0 Cookbook - Fourth Edition

By : Alberto Paro
Book Image

Elasticsearch 7.0 Cookbook - Fourth Edition

By: Alberto Paro

Overview of this book

Elasticsearch is a Lucene-based distributed search server that allows users to index and search unstructured content with petabytes of data. With this book, you'll be guided through comprehensive recipes on what's new in Elasticsearch 7, and see how to create and run complex queries and analytics. Packed with recipes on performing index mapping, aggregation, and scripting using Elasticsearch, this fourth edition of Elasticsearch Cookbook will get you acquainted with numerous solutions and quick techniques for performing both every day and uncommon tasks such as deploying Elasticsearch nodes, integrating other tools to Elasticsearch, and creating different visualizations. You will install Kibana to monitor a cluster and also extend it using a variety of plugins. Finally, you will integrate your Java, Scala, Python, and big data applications such as Apache Spark and Pig with Elasticsearch, and create efficient data applications powered by enhanced functionalities and custom plugins. By the end of this book, you will have gained in-depth knowledge of implementing Elasticsearch architecture, and you'll be able to manage, search, and store data efficiently and effectively using Elasticsearch.
Table of Contents (23 chapters)
Title Page

Setting up different node types

Elasticsearch is natively designed for the cloud, so when you need to release a production environment with a huge number of records and you need high availability and good performance, you need to aggregate more nodes in a cluster.

Elasticsearch allows you to define different types of nodes to balance and improve overall performance.

Getting ready

As described in the Downloading and installing Elasticsearch recipe, you need a working Elasticsearch installation and a simple text editor to change the configuration files.

How to do it…

For the advanced setup of a cluster, there are some parameters that must be configured to define different node types.

These parameters are in the config/elasticsearch.yml, file and they can be set with the following steps:

  1. Set up whether the node can be a master or not, as follows:
node.master: true
  1. Set up whether a node must contain data or not, as follows:
node.data: true
  1. Set up whether a node can work as an ingest node, as follows:
node.ingest: true

How it works…

The node.master parameter establishes that the node can become a master for the cloud. The default value for this parameter is true. A master node is an arbiter for the cloud; it takes decisions about shard management, keeps the cluster status, and is the main controller of every index action. If your master nodes are on overload, all the clusters will have performance penalties. The master node is the node that distributes the search across all data nodes and aggregates/rescores the result to return them to the user. In big data terms, it's a Redux layer in the Map/Redux search in Elasticsearch.

The number of master nodes must always be even.

The node.data parameter allows you to store data in the node. The default value for this parameter is true. This node will be a worker that is responsible for indexing and searching data.

By mixing these two parameters, it's possible to have different node types, as shown in the following table:

node.master

node.data

Node description

true

true

This is the default node. It can be the master, which contains data.

false

true

This node never becomes a master node; it only holds data. It can be defined as a workhorse for your cluster.

true

false

This node only serves as a master in order to avoid storing any data and to have free resources. This will be the coordinator of your cluster.

false

false

This node acts as a search load balancer (fetching data from nodes, aggregating results, and so on). This kind of node is also called a coordinator or client node.

The most frequently used node type is the first one, but if you have a very big cluster or special needs, you can change the scopes of your nodes to better serve searches and aggregations.

There's more…

Related to the number of master nodes, there are settings that require at least half of them plus one to be available to ensure that the cluster is in a safe state (no risk of split brain: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/modules-node.html#split-brain). This setting is discovery.zen.minimum_master_nodes, and it must be set to the following equation:

(master_eligible_nodes / 2) + 1

To have a High Availability (HA) cluster, you need at least three nodes that are masters with the value of minimum_master_nodes set to 2.