Book Image

Elasticsearch for Hadoop

By : Vishal Shukla
Book Image

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Table of Contents (15 chapters)
Elasticsearch for Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Going real — network monitoring data


Now that we have an idea of how to write a job that does some processing with MapReduce and pushes the data from HDFS to Elasticsearch, let's try out a real-world example to get the feel of what value we can get by performing this the ES-Hadoop way.

For illustration purposes, let's consider an example dataset of log files from a hypothetical network security and a monitoring tool. This tool acts as a gateway-cum-firewall between the devices connected in the network and the Internet. The firewall detects viruses or spyware, checks the category of the outgoing traffic, and blocks or allows the request based on the configured policies.

Getting and understanding the data

You can download the sample data generated by the tool at https://github.com/vishalbrevitaz/eshadoop/tree/master/ch02. Here is a snippet of the data for you to take a quick look:

Jan 01 12:26:26 src="10.1.1.89:0" dst="74.125.130.95"  id="None" act="ALLOW" msg="fonts.googleapis.com/css?family...