Now that we have an idea of how to write a job that does some processing with MapReduce and pushes the data from HDFS to Elasticsearch, let's try out a real-world example to get the feel of what value we can get by performing this the ES-Hadoop way.
For illustration purposes, let's consider an example dataset of log files from a hypothetical network security and a monitoring tool. This tool acts as a gateway-cum-firewall between the devices connected in the network and the Internet. The firewall detects viruses or spyware, checks the category of the outgoing traffic, and blocks or allows the request based on the configured policies.
You can download the sample data generated by the tool at https://github.com/vishalbrevitaz/eshadoop/tree/master/ch02. Here is a snippet of the data for you to take a quick look:
Jan 01 12:26:26 src="10.1.1.89:0" dst="74.125.130.95" id="None" act="ALLOW" msg="fonts.googleapis.com/css?family...