Elasticsearch for Hadoop

Spark is a distributed computing system that provides a huge performance boost as compared to Hadoop's MapReduce. It works on an abstraction of RDD (resilient-distributed datasets). This can be created for any data residing in Hadoop. Without any surprises, ES-Hadoop provides easy integration using Spark by enabling the creation of RDD from data in Elasticsearch.

Spark's increasing support for integrating various data sources, such as HDFS, Parquet, Avro, S3, Cassandra, relational databases, and streaming data make it special when it comes to data integration. This means that using ES-Hadoop (along with Spark), you can make all these sources integrate into Elasticsearch easily.

Setting up Spark

In order to set up Apache Spark to execute a job, you can perform the following steps:

Download the Apache Spark distribution with the following command:

$ sudo wget –O /usr/local/spark.tgz http://www.apache.org/dyn/closer.cgi/spark/spark-1.4.1/spark-1.4.1-bin-hadoop2.4.tgz...

Elasticsearch for Hadoop

By : Vishal Shukla

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Related Content you might be interested in

Current Title:

Elasticsearch for Hadoop

Giving Spark to Elasticsearch

Setting up Spark