Book Image

Elasticsearch for Hadoop

By : Vishal Shukla
Book Image

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Table of Contents (15 chapters)
Elasticsearch for Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Setting up Elasticsearch


In this section, we will download and configure the Elasticsearch server and install the Elasticsearch Head and Marvel plugins.

Downloading Elasticsearch

To download Elasticsearch, perform the following steps:

  1. First, download Elasticsearch using the following command:

    $ wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.1.tar.gz
    
  2. Once the file is downloaded, extract it to /usr/local and rename it with a convenient name, using the following command:

    $ sudo tar -xvzf elasticsearch-1.7.1.tar.gz -C /usr/local
    $ sudo mv /usr/local/elasticsearch-1.7.1 /usr/local/elasticsearch
    
  3. Then, set the eshadoop user as the owner of the directory as follows:

    $ sudo chown -R eshadoop:hadoop /usr/local/elasticsearch
    

Configuring Elasticsearch

The Elasticsearch configuration file, elasticsearch.yml, can be located in the config folder under the Elasticsearch home directory. Open the elasticsearch.yml file in the editor of your choice by using the following command:

$ cd /usr/local/elasticsearch
$ vi config/elasticsearch.yml

Uncomment the line with the cluster.name key from the elasticsearch.yml file and change the cluster name, as shown in the following code:

  cluster.name:eshadoopcluster

Similarly, uncomment the line with the node.name key and change the value as follows:

node.name:"ES Hadoop Node"

Note

Elasticsearch comes with a decent default configuration to let you start the nodes with zero additional configurations. In a production environment and even in a development environment, sometimes it may be desirable to tweak some configurations.

By default, Elasticsearch assigns the node name from the randomly picked Marvel character name from a list of 3,000 names. The default cluster name assigned to the node is elasticsearch. With the default configurations of ES nodes in the same network and the same cluster name, Elasticsearch will synchronize the data between the nodes. This may be unwanted if each developer is looking for an isolated ES server setup. It's always good to specify cluster.name and node.name to avoid unwanted surprises.

You can change the defaults for configurations starting with path.*. To set up the directories that store the server data, to locate paths section, and to uncomment the highlighted paths and changes, use the following code:

 ########################### paths #############################
 # Path to directory containing configuration (this file and logging.yml):
#
path.conf: /usr/local/elasticsearch/config

# Path to directory where to store index data allocated for this node.
# 
# Can optionally include more than one location, causing data to be striped across
# the locations (a la RAID 0) on a file level, favouring locations with most free
# space on creation. 
path.data: /usr/local/elasticsearch/data


# Path to temporary files:
#
path.work: /usr/local/elasticsearch/work

# Path to log files:
#
path.logs: /usr/local/elasticsearch/logs

Note

It's important to choose the location of path.data wisely. In production, you should make sure that this path doesn't exist in the Elasticsearch installation directory in order to avoid accidently overwriting or deleting the data when upgrading Elasticsearch.

Installing Elasticsearch's Head plugin

Elasticsearch provides a plugin utility to install the Elasticsearch plugins. Execute the following command to install the Head plugin:

$ bin/plugin -install mobz/elasticsearch-head

-> Installing mobz/elasticsearch-head...
Trying https://github.com/mobz/elasticsearch-head/archive/master.zip...
Downloading ..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................DONE
Installed mobz/elasticsearch-head into /usr/local /elasticsearch/plugins/head
Identified as a _site plugin, moving to _site structure ...

As indicated by the console output, the plugin is successfully installed in the default plugins directory under the Elasticsearch home. You can access the head plugin at http://localhost:9200/_plugin/head/.

Installing the Marvel plugin

Now, let's install the Marvel plugin using a similar command:

$ bin/plugin -i elasticsearch/marvel/latest

-> Installing elasticsearch/marvel/latest...
Trying http://download.elasticsearch.org/elasticsearch/marvel/marvel-latest.zip...
Downloading ................................................................................................................................................................................................................................................................................................................................DONE
Installed elasticsearch/marvel/latest into /usr/local/elasticsearch/plugins/marvel

Running and testing

Finally, start Elasticsearch using the following command:

$ ./bin/elasticsearch

We will then get the following log:

[2015-05-13 21:59:37,344][INFO ][node                     ] [ES Hadoop Node] version[1.5.1], pid[3822], build[5e38401/2015-04-09T13:41:35Z]
[2015-05-13 21:59:37,346][INFO ][node                     ] [ES Hadoop Node] initializing ...
[2015-05-13 21:59:37,358][INFO ][plugins                  ] [ES Hadoop Node] loaded [marvel], sites [marvel, head]
[2015-05-13 21:59:39,956][INFO ][node                     ] [ES Hadoop Node] initialized
[2015-05-13 21:59:39,959][INFO ][node                     ] [ES Hadoop Node] starting ...
[2015-05-13 21:59:40,133][INFO ][transport                ] [ES Hadoop Node] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.0.2.15:9300]}
[2015-05-13 21:59:40,159][INFO ][discovery                ] [ES Hadoop Node] eshadoopcluster/_bzqXWbLSXKXWpafHaLyRA
[2015-05-13 21:59:43,941][INFO ][cluster.service          ] [ES Hadoop Node] new_master [ES Hadoop Node][_bzqXWbLSXKXWpafHaLyRA][eshadoop][inet[/10.0.2.15:9300]], reason: zen-disco-join (elected_as_master)
[2015-05-13 21:59:43,989][INFO ][http                     ] [ES Hadoop Node] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.0.2.15:9200]}
[2015-05-13 21:59:43,989][INFO ][node                     ] [ES Hadoop Node] started
[2015-05-13 21:59:44,026][INFO ][gateway                  ] [ES Hadoop Node] recovered [0] indices into cluster_state
[2015-05-13 22:00:00,707][INFO ][cluster.metadata         ] [ES Hadoop Node] [.marvel-2015.05.13] creating index, cause [auto(bulk api)], templates [marvel], shards [1]/[1], mappings [indices_stats, cluster_stats, node_stats, shard_event, node_event, index_event, index_stats, _default_, cluster_state, cluster_event, routing_event]
[2015-05-13 22:00:01,421][INFO ][cluster.metadata         ] [ES Hadoop Node] [.marvel-2015.05.13] update_mapping [node_stats] (dynamic)

The startup logs will give you some useful hints as to what is going on. By default, Elasticsearch uses the transport ports from 9200 to 9299 for HTTP, allocating the first port that is available for the node. In the highlighted output, you can also see that it binds to the port 9300 as well. Elasticsearch uses the port range from 9300 to 9399 for an internal node-to-node communication or when communicating using the Java client. It can use the zen multicast or the unicast ping discovery to find other nodes in the cluster with multicast as the default. We will understand more about these discovery nodes in later chapters.