Book Image

Elasticsearch Essentials

Book Image

Elasticsearch Essentials

Overview of this book

With constantly evolving and growing datasets, organizations have the need to find actionable insights for their business. ElasticSearch, which is the world's most advanced search and analytics engine, brings the ability to make massive amounts of data usable in a matter of milliseconds. It not only gives you the power to build blazing fast search solutions over a massive amount of data, but can also serve as a NoSQL data store. This guide will take you on a tour to become a competent developer quickly with a solid knowledge level and understanding of the ElasticSearch core concepts. Starting from the beginning, this book will cover these core concepts, setting up ElasticSearch and various plugins, working with analyzers, and creating mappings. This book provides complete coverage of working with ElasticSearch using Python and performing CRUD operations and aggregation-based analytics, handling document relationships in the NoSQL world, working with geospatial data, and taking data backups. Finally, we’ll show you how to set up and scale ElasticSearch clusters in production environments as well as providing some best practices.
Table of Contents (18 chapters)
Elasticsearch Essentials
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Installing and configuring Elasticsearch


I have used the Elasticsearch Version 2.0.0 in this book; you can choose to install other versions, if you wish to. You just need to replace the version number with 2.0.0. You need to have an administrative account to perform the installations and configurations.

Installing Elasticsearch on Ubuntu through Debian package

Let's get started with installing Elasticsearch on Ubuntu Linux. The steps will be the same for all Ubuntu versions:

  1. Download the Elasticsearch Version 2.0.0 Debian package:

    wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-2.0.0.deb
    
  2. Install Elasticsearch, as follows:

    sudo dpkg -i elasticsearch-2.0.0.deb
    
  3. To run Elasticsearch as a service (to ensure Elasticsearch starts automatically when the system is booted), do the following:

    sudo update-rc.d elasticsearch defaults 95 10
    

Installing Elasticsearch on Centos through the RPM package

Follow these steps to install Elasticsearch on Centos machines. If you are using any other Red Hat Linux distribution, you can use the same commands, as follows:

  1. Download the Elasticsearch Version 2.0.0 RPM package:

    wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-2.0.0.rpm
    
  2. Install Elasticsearch, using this command:

    sudo rpm -i elasticsearch-2.0.0.rpm
    
  3. To run Elasticsearch as a service (to ensure Elasticsearch starts automatically when the system is booted), use the following:

    sudo systemctl daemon-reload
    sudo systemctl enable elasticsearch.service
    

Understanding the Elasticsearch installation directory layout

The following table shows the directory layout of Elasticsearch that is created after installation. These directories, have some minor differences in paths depending upon the Linux distribution you are using.

Description

Path on Debian/Ubuntu

Path on RHEL/Centos

Elasticsearch home directory

/usr/share/elasticsearch

/usr/share/elasticsearch

Elasticsearch and Lucene jar files

/usr/share/elasticsearch/lib

/usr/share/elasticsearch/lib

Contains plugins

/usr/share/elasticsearch/plugins

/usr/share/elasticsearch/plugins

The locations of the binary scripts that are used to start an ES node and download plugins

usr/share/elasticsearch/bin

usr/share/elasticsearch/bin

Contains the Elasticsearch configuration files: (elasticsearch.yml and logging.yml)

/etc/elasticsearch

/etc/elasticsearch

Contains the data files of the index/shard allocated on that node

/var/lib/elasticsearch/data

/var/lib/elasticsearch/data

The startup script for Elasticsearch (contains environment variables including HEAP SIZE and file descriptors)

/etc/init.d/elasticsearch

/etc/sysconfig/elasticsearch

Or /etc/init.d/elasticsearch

Contains the log files of Elasticsearch.

/var/log/elasticsearch/

/var/log/elasticsearch/

During installation, a user and a group with the elasticsearch name are created by default. Elasticsearch does not get started automatically just after installation. It is prevented from an automatic startup to avoid a connection to an already running node with the same cluster name.

Note

It is recommended to change the cluster name before starting Elasticsearch for the first time.

Configuring basic parameters

  1. Open the elasticsearch.yml file, which contains most of the Elasticsearch configuration options:

    sudo vim /etc/elasticsearch/elasticsearch.yml
    
  2. Now, edit the following ones:

    • cluster.name: The name of your cluster

    • node.name: The name of the node

    • path.data: The path where the data for the ES will be stored

    Note

    Similar to path.data, we can change path.logs and path.plugins as well. Make sure all these parameters values are inside double quotes.

  3. After saving the elasticsearch.yml file, start Elasticsearch:

    sudo service elasticsearch start
    

    Elasticsearch will start on two ports, as follows:

    • 9200: This is used to create HTTP connections

    • 9300: This is used to create a TCP connection through a JAVA client and the node's interconnection inside a cluster

      Note

      Do not forget to uncomment the lines you have edited. Please note that if you are using a new data path instead of the default one, then you first need to change the owner and the group of that data path to the user, elasticsearch.

      The command to change the directory ownership to elasticsearch is as follows:

      sudo chown –R elasticsearch:elasticsearch data_directory_path
      
  4. Run the following command to check whether Elasticsearch has been started properly:

    sudo service elasticsearch status
    

    If the output of the preceding command is shown as elasticsearch is not running, then there must be some configuration issue. You can open the log file and see what is causing the error.

The list of possible issues that might prevent Elasticsearch from starting is:

  • A Java issue, as discussed previously

  • Indention issues in the elasticsearch.yml file

  • At least 1 GB of RAM is not free to be used by Elasticsearch

  • The ownership of the data directory path is not changed to elasticsearch

  • Something is already running on port 9200 or 9300

Adding another node to the cluster

Adding another node in a cluster is very simple. You just need to follow all the steps for installation on another system to install a new instance of Elasticsearch. However, keep the following in mind:

  • In the elasticsearch.yml file, cluster.name is set to be the same on both the nodes

  • Both the systems should be reachable from each other over the network.

  • There is no firewall rule set for Elasticsearch port blocking

  • The Elasticsearch and JAVA versions are the same on both the nodes

You can optionally set the network.host parameter to the IP address of the system to which you want Elasticsearch to be bound and the other nodes to communicate.

Installing Elasticsearch plugins

Plugins provide extra functionalities in a customized manner. They can be used to query, monitor, and manage tasks. Thanks to the wide Elasticsearch community, there are several easy-to-use plugins available. In this book, I will be discussing some of them.

The Elasticsearch plugins come in two flavors:

  • Site plugins: These are the plugins that have a site (web app) in them and do not contain any Java-related content. After installation, they are moved to the site directory and can be accessed using es_ip:port/_plugin/plugin_name.

  • Java plugins: These mainly contain .jar files and are used to extend the functionalities of Elasticsearch. For example, the Carrot2 plugin that is used for text-clustering purposes.

Elasticsearch ships with a plugin script that is located in the /user/share/elasticsearch/bin directory, and any plugin can be installed using this script in the following format:

bin/plugin --install plugin_url

Note

Once the plugin is installed, you need to restart that node to make it active. In the following image, you can see the different plugins installed inside the Elasticsearch node. Plugins need to be installed separately on each node of the cluster.

The following is the layout of the plugin directory of Elasticsearch:

Checking for installed plugins

You can check the log of your node that shows the following line at start up time:

[2015-09-06 14:16:02,606][INFO ][plugins                  ] [Matt Murdock] loaded [clustering-carrot2, marvel], sites [marvel, carrot2, head]

Alternatively, you can use the following command:

curl XGET 'localhost:9200/_nodes/plugins'?pretty

Another option is to use the following URL in your browser:

http://localhost:9200/_nodes/plugins

Installing the Head plugin for Elasticsearch

The Head plugin is a web front for the Elasticsearch cluster that is very easy to use. This plugin offers various features such as showing the graphical representations of shards, the cluster state, easy query creations, and downloading query-based data in the CSV format.

The following is the command to install the Head plugin:

sudo /usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head

Restart the Elasticsearch node with the following command to load the plugin:

sudo service elasticsearch restart

Once Elasticsearch is restarted, open the browser and type the following URL to access it through the Head plugin:

http://localhost:9200/_plugin/head

Note

More information about the Head plugin can be found here: https://github.com/mobz/elasticsearch-head

Installing Sense for Elasticsearch

Sense is an awesome tool to query Elasticsearch. You can add it to your latest version of Chrome, Safari, or Firefox browsers as an extension.

Now, when Elasticsearch is installed and running in your system, and you have also installed the plugins, you are good to go with creating your first index and performing some basic operations.