Book Image

Learning Hunk

By : Dmitry Anoshin, Sergey Sheypak
Book Image

Learning Hunk

By: Dmitry Anoshin, Sergey Sheypak

Overview of this book

Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data. This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform. You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.
Table of Contents (14 chapters)

Controlling security with Hunk


What is security? We could work on data security forever; this part of IT infrastructure is infinite. Companies are usually interested in these aspects:

  • User/group/access control list-based access to data: Administrators should have something similar to the read/write/execute in Linux. We can set who owns the data and who can read/write it.

  • Audit/log access to data: We need to know who got access to data, and when and how.

  • Isolation: We don't want our data to be publicly accessible. We would like to set access to clusters from specific subnets, for example. We are not going to try to set up all the functionality required for production-ready security. Our aim is to set simple security to prevent unauthorized users from accessing the data.

What is data security? It consists of three major parts:

  • Authentication

  • Authorization

  • Audit

These three properties give us a clue as to who did something somewhere, and which privileges were used. Hadoop security setup is still...