Book Image

Elasticsearch 7.0 Cookbook - Fourth Edition

By : Alberto Paro
Book Image

Elasticsearch 7.0 Cookbook - Fourth Edition

By: Alberto Paro

Overview of this book

Elasticsearch is a Lucene-based distributed search server that allows users to index and search unstructured content with petabytes of data. With this book, you'll be guided through comprehensive recipes on what's new in Elasticsearch 7, and see how to create and run complex queries and analytics. Packed with recipes on performing index mapping, aggregation, and scripting using Elasticsearch, this fourth edition of Elasticsearch Cookbook will get you acquainted with numerous solutions and quick techniques for performing both every day and uncommon tasks such as deploying Elasticsearch nodes, integrating other tools to Elasticsearch, and creating different visualizations. You will install Kibana to monitor a cluster and also extend it using a variety of plugins. Finally, you will integrate your Java, Scala, Python, and big data applications such as Apache Spark and Pig with Elasticsearch, and create efficient data applications powered by enhanced functionalities and custom plugins. By the end of this book, you will have gained in-depth knowledge of implementing Elasticsearch architecture, and you'll be able to manage, search, and store data efficiently and effectively using Elasticsearch.
Table of Contents (23 chapters)
Title Page

Setting up a node

Elasticsearch allows the customization of several parameters in an installation. In this recipe, we'll see the most used ones to define where to store our data and improve overall performance.

Getting ready

As described in the downloading and installing Elasticsearch recipe, you need a working Elasticsearch installation and a simple text editor to change configuration files.

How to do it…

The steps required for setting up a simple node are as follows:

  1. Open the config/elasticsearch.yml file with an editor of your choice.
  2. Set up the directories that store your server data, as follows:
  • For Linux or macOS X, add the following path entries (using /opt/data as the base path):
path.conf: /opt/data/es/conf
path.data: /opt/data/es/data1,/opt2/data/data2
path.work: /opt/data/work
path.logs: /opt/data/logs
path.plugins: /opt/data/plugins

  • For Windows, add the following path entries (using c:\Elasticsearch as the base path):
path.conf: c:\Elasticsearch\conf
path.data: c:\Elasticsearch\data
path.work: c:\Elasticsearch\work
path.logs: c:\Elasticsearch\logs
path.plugins: c:\Elasticsearch\plugins
  1. Set up the parameters to control the standard index shard and replication at creation. These parameters are as follows:
index.number_of_shards: 1
index.number_of_replicas: 1

How it works…

The path.conf parameter defines the directory that contains your configurations, mainly elasticsearch.yml and logging.yml. The default is $ES_HOME/config, with ES_HOME to install the directory of your Elasticsearch server.

It's useful to set up the config directory outside your application directory so that you don't need to copy the configuration files every time you update your Elasticsearch server.

The path.data parameter is the most important one. This allows us to define one or more directories (in a different disk) where you can store your index data. When you define more than one directory, they are managed similarly to RAID 0 (their space is sum up), favoring locations with the most free space.

The path.work parameter is a location in which Elasticsearch stores temporary files.

The path.log parameter is where log files are put. These control how a log is managed in logging.yml.

The path.plugins parameter allows you to override the plugins path (the default is $ES_HOME/plugins). It's useful to put system-wide plugins in a shared path (usually using NFS) in case you want a single place where you store your plugins for all of the clusters.

The main parameters are used to control index and shards in index.number_of_shards, which controls the standard number of shards for a new created index, and index.number_of_replicas, which controls the initial number of replicas.

See also