Book Image

Learning Hunk

By : Dmitry Anoshin, Sergey Sheypak
Book Image

Learning Hunk

By: Dmitry Anoshin, Sergey Sheypak

Overview of this book

Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data. This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform. You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.
Table of Contents (14 chapters)

Hunk report acceleration


We can easily accelerate our searches, which is critical for business. The idea behind Hunk is easy: the same search on the same data always gives the same result. In other words, same search + same data = same results. In the case of acceleration, Hunk caches the results and returns them on demand. Moreover, it gives us the opportunity to choose a data range for a particular data summary. In other words, if the data change is due to a fresh portion of events, then the accelerated report will rebuild the data summary in order to meet the requirements of the particular data range. Technically, we just cache the map phase in HDFS. When we run the accelerated search, Hunk just returns straight to us. There are four main steps in running an accelerated search:

  1. The scheduled job builds a cache.

  2. Find cache hits.

  3. Stream the results to a search head.

  4. Reduce on the search head.

Tip

There is more information about search heads at: http://docs.splunk.com/Splexicon:Searchhead.

The...