Cascading abstracts out the complexities of MapReduce by providing a platform for data processing in terms of pipes and taps. This section may be of interest to you if you already use cascading in your projects, or if you are already aware about cascading and wish to integrate it into Elasticsearch. Hence, a basic cascading knowledge is assumed for this section.
ES-Hadoop comes with a dedicated EsTap
that implements SourceSink
and SourceTap
to provide plug points to integrate it into cascading.
Let's write a cascading job to import data from HDFS to Elasticsearch.
Here is code for the main()
method that tells you how to cascade a job's driver class:
Properties props = new Properties(); props.setProperty("es.mapping.id", "id"); FlowConnector flow = new HadoopFlowConnector(props);
ES-Hadoop provides all the standard configurations that you learned earlier are specified in the Properties
object. The Properties
object...