Book Image

Learning Hunk

By : Dmitry Anoshin, Sergey Sheypak
Book Image

Learning Hunk

By: Dmitry Anoshin, Sergey Sheypak

Overview of this book

Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data. This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform. You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.
Table of Contents (14 chapters)

An introduction to Amazon EMR and S3


In this section, we will learn about Amazon EMR and Simple Storage Service (S3). Moreover, we try to run these services by creating EMR clusters and S3 buckets.

Amazon EMR

Amazon EMR is a Hadoop framework in the cloud offered as a managed service. It is used by thousands of customers. It uses millions of EMR clusters in a variety of big data use cases, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. EMR can easily process any type of big data without its own big data infrastructure:

As with any other Amazon service, EMR is easy to run by filling in option forms. Enter the cluster name, the size, and the types of node in the cluster. And it creates in two minutes a fully running EMR cluster. It is ready to process data. It removes all the headache of maintaining clusters and version compatibility. Amazon takes care of all tasks involved in running and supporting Hadoop...