Book Image

Learning Hunk

By : Dmitry Anoshin, Sergey Sheypak
Book Image

Learning Hunk

By: Dmitry Anoshin, Sergey Sheypak

Overview of this book

Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data. This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform. You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.
Table of Contents (14 chapters)

Counting events in all collections


We can access our daily data stored in separated collections and virtual indexes using a pattern. Let's count the events in each collection and sort by the collection size:

Use this expression:

index=clicks_2015_* | stats count by index | sort – count

We can see the trend: users visit shops during working days more often (the 1st of February is Sunday, the 5th is Thursday) so we get more clicks from them:

Next is the query related to metadata. We don't query the exact index; we use a wildcard to query several indexes at once:

index=clicks_2015_*

Note

Metadata is data that describes data. Index name is the data description. We have virtual indexes based on Mongo collections that hold click events. Each virtual index has a name. So the virtual index name is metadata.

Counting events in shops for observed days

Let's count how many events happen during observed days in each shop:

index=clicks_2015_* | stats count by index, shop_id | sort +index, -count

We sort by index...