Spark is a distributed computing system that provides a huge performance boost as compared to Hadoop's MapReduce. It works on an abstraction of RDD (resilient-distributed datasets). This can be created for any data residing in Hadoop. Without any surprises, ES-Hadoop provides easy integration using Spark by enabling the creation of RDD from data in Elasticsearch.
Spark's increasing support for integrating various data sources, such as HDFS, Parquet, Avro, S3, Cassandra, relational databases, and streaming data make it special when it comes to data integration. This means that using ES-Hadoop (along with Spark), you can make all these sources integrate into Elasticsearch easily.