Learning Hunk

Learning Hunk

By : Dmitry Anoshin, Sergey Sheypak

Buy this Book

Learning Hunk

By: Dmitry Anoshin, Sergey Sheypak

Buy this Book

Overview of this book

Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data. This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform. You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.

Learning Hunk

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Meet Hunk

Starting the VM and cluster in VirtualBox

Big data use case

Summary

Explore Hadoop Data with Hunk

Setting up Hunk

Exploring data

Controlling security with Hunk

Summary

Meeting Hunk Features

Knowledge objects

Introducing Pivot

Summary

Adding Speed to Reports

Big data performance issues

Hunk report acceleration

Hunk accelerations limits

Summary

Customizing Hunk

What we are going to do with the Splunk SDK

Dashboard customization using Splunk Web Framework

A description of time-series aggregated CDR data

Implementation

Custom map components

The final result

Summary

Discovering Hunk Integration Apps

What is Mongo?

Counting by shop in a single collection

Counting events in all collections

Summary

Exploring Data in the Cloud

An introduction to Amazon EMR and S3

Integrating Hunk with EMR and S3

Converting Hunk from an hourly rate to a license

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

The big problem

Hadoop is a distributed file system and a distributed framework designed to compute large chunks of data. It is relatively easy to get data into Hadoop. There are plenty of tools for getting data into different formats, such as Apache Phoenix. However it is actually extremely difficult to get value out of the data you put into Hadoop.

Let's look at the path from data to value. First we have to start with collecting data. Then we also spend a lot of time preparing it, making sure that this data is available for analysis, and being able to question the data. This process is as follows:

Unfortunately, you may not have asked the right questions or the answers are not clear, and you have to repeat this cycle. Maybe you have transformed and formatted your data. In other words, it is a long and challenging process.

What you actually want is to collect the data and spend some time preparing it; then you can ask questions and get answers repetitively. Now, you can spend a lot of time asking multiple questions. In addition, you can iterate with data on those questions to refine the answers that you are looking for. Let's look at the following diagram, in order to find a new approach:

Learning Hunk

By : Dmitry Anoshin, Sergey Sheypak

Learning Hunk

By: Dmitry Anoshin, Sergey Sheypak

Overview of this book

Related Content you might be interested in

Current Title:

Learning Hunk

The big problem