Elasticsearch for Hadoop

In the previous section, we set up a couple of plugins: Head and Marvel. In this section, we will have a bird's eye view of how to use these plugins to explore the Elasticsearch documents that we just imported by running the ES-Hadoop MapReduce job.

Viewing data in Head

The Elasticsearch Head plugin provides a simple web frontend to visualize the Elasticsearch indices, cluster, node health, and statistics. It provides an easy-to-use interface to explore index, types, and documents with the query building interface. It also allows you to view the documents of Elasticsearch in a table-like structure as well, which can be quite handy for users coming from the RDBMS background.

Here is how the Elasticsearch Head home page looks when you open http://localhost:9200/_plugin/head.

The following image shows the home page of the Elasticsearch Head plugin:

You will get a quick insight into your cluster from the preceding screenshot, such as what is cluster health is (Green, Yellow, or Red); how the shards are allocated to different nodes, which indices exist in the cluster, what the size is of each index, and so on. For example, in the preceding screenshot, we can see two indices: .marvel-2015.05.10 and eshadoop. You may be surprised that we never created an index with the name of .marvel-2015.05.10. You can ignore this index for the time being; we will take a brief look at it in the next subsection.

Let's go back to our WordCount example. You can see that the document count for the eshadoop index in the preceding screenshot exactly matches with the number of documents metric indicated by the MapReduce job output that we saw in the last section.

The following diagram shows the Browser tab of the Elasticsearch Head plugin:

To take a look at the documents, navigate to the Browser tab. You can see that the screen is similar to the one shown in the preceding screenshot. You can click on the eshadoop index on the left-hand side under the Indices heading and sort the results by count to see the relevant documents. You can also see that the output of the MapReduce job is pushed directly to Elasticsearch. Further more, you can see the ES document fields, such as _index, _type, _id, and _score, along with the fields that we are interested in word and count. You may want to sort the results based on count by clicking on the count column to see the most frequent words in the sample.txt file.

Using the Marvel dashboard

Marvel is a monitoring dashboard for real-time and historical analysis that is built on top of Kibana: a data visualization tool for ES-Hadoop. This dashboard provides, insight into the different metrics of the node, JVM, and ES-Hadoop internals. To open the Marvel dashboard, refer to your browser at http://localhost:9200/_plugin/marvel/.

The following screenshot gives you an overview of the Marvel dashboard:

You can see the different real-time metrics for your cluster, nodes, and indices. You can visualize the trends of the document count, search, and the indexing request rates in a graphical way. This kind of visualization may be helpful to get a quick insight into the usage pattern of the index and find out the candidates for the purpose of performance optimization. It displays the vital monitoring stats, such as the CPU usage, the load, the JVM memory usage, the free disk space, and so on. You can also filter by time range in the top-right corner to use the dashboard for historical analysis. Marvel stores these historical data in a separate daily rolling index with a name pattern, such as .marvel-XXX.

Exploring the data in Sense

Sense is a plugin embedded in Marvel to provide a seamless and easy-to-use REST API client for the ES-Hadoop server. It is Elasticsearch-aware and frees you from memorizing the ES-Hadoop query syntaxes by providing autosuggestions. It also helps by indicating the typo or syntax errors.

To open the Sense user interface, open http://localhost:9200/_plugin/marvel/sense/index.html in your browser.

The following screenshot shows the query interface of Sense:

Now, let's find out the documents imported in the eshadoop index by executing the match_all query.

Then, use the following query in the query panel on the left-hand side in the sense interface:

GET eshadoop/_search
{
   "query": {
        "match_all":{}
    }
}

Finally, click on the Send request button to execute the query and obtain the results.

Note

You can point to different Elasticsearch servers if you wish by changing the server field at the top.

Elasticsearch for Hadoop

By : Vishal Shukla

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Related Content you might be interested in

Current Title:

Elasticsearch for Hadoop

Exploring data in Head and Marvel

Viewing data in Head

Using the Marvel dashboard

Exploring the data in Sense

Note