Book Image

Learning Hunk

By : Dmitry Anoshin, Sergey Sheypak
Book Image

Learning Hunk

By: Dmitry Anoshin, Sergey Sheypak

Overview of this book

Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data. This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform. You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.
Table of Contents (14 chapters)

A description of time-series aggregated CDR data


We used the Oozie coordinator in Chapter 1, Meet Hunk, to import massive amounts of data. Data is partitioned by date and stored in binary format with a schema. It looks like a production-ready approach. Avro is pretty well supported across the whole Hadoop ecosystem. Now we are going to create a custom application using that data. Have a look at the description of the data.

Here is a description of the data stored in the base table:

  • Square ID: The ID of the square that is part of the Milano grid type: numeric.

  • Time interval: The beginning of the time interval expressed as the number of milliseconds elapsed from the Unix Epoch on January 1, 1970 at UTC. The end of the time interval can be obtained by adding 600,000 milliseconds (10 minutes) to this value.

  • Country code: The phone code of a nation. Depending on the measured activity this value assumes different meanings that are explained later.

  • SMS-in activity: The activity in terms of received...