Book Image

Elasticsearch for Hadoop

By : Vishal Shukla
Book Image

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Table of Contents (15 chapters)
Elasticsearch for Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

The ES-Hadoop architecture


We explored the Elasticsearch architecture and the way Elasticsearch achieves scalability in the distributed environment. Hadoop also works in a distributed environment. In this section, we will explore how ES-Hadoop leverages these two distributed systems to combine the capabilities of both systems.

Dynamic parallelism

We are already familiar with the unit of parallelism in Elasticsearch as a shard. The more shards we have, the more parallelism we get, provided that different shards don't compete against the same resources. Similarly, you may be already aware about the fact that a split represents the unit of parallelization in Hadoop. InputSplit represents the data input for one mapper. When we run a Hadoop job, InputFormat divides the input into several InputSplits. This is passed on to individual mapper classes for further processing.

The following image shows how ES-Hadoop makes the clusters of Hadoop and Elasticsearch talk to each other:

Here, we can see the...