Data Analytics Using Splunk 9.x

By : Dr. Nadine Shillingford

5 (1)

Buy this Book

Data Analytics Using Splunk 9.x

5 (1)

By: Dr. Nadine Shillingford

Buy this Book

Overview of this book

Splunk 9 improves on the existing Splunk tool to include important features such as federated search, observability, performance improvements, and dashboarding. This book helps you to make the best use of the impressive and new features to prepare a Splunk installation that can be employed in the data analysis process. Starting with an introduction to the different Splunk components, such as indexers, search heads, and forwarders, this Splunk book takes you through the step-by-step installation and configuration instructions for basic Splunk components using Amazon Web Services (AWS) instances. You’ll import the BOTS v1 dataset into a search head and begin exploring data using the Splunk Search Processing Language (SPL), covering various types of Splunk commands, lookups, and macros. After that, you’ll create tables, charts, and dashboards using Splunk’s new Dashboard Studio, and then advance to work with clustering, container management, data models, federated search, bucket merging, and more. By the end of the book, you’ll not only have learned everything about the latest features of Splunk 9 but also have a solid understanding of the performance tuning techniques in the latest version.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1: Getting Started with Splunk

Free Chapter

Chapter 1: Introduction to Splunk and its Core Components

Splunking big data

Exploring Splunk components

Introducing the case study – splunking the  BOTS Dataset v1

Summary

Chapter 2: Setting Up the Splunk Environment

Technical requirements

Installing Splunk Enterprise

Setting up Splunk forwarders

Setting up Splunk deployment servers

Setting up Splunk indexers

Setting up Splunk search heads

Installing additional Splunk add-ons and apps

Managing access to Splunk

Summary

Chapter 3: Onboarding and Normalizing Data

Exploring inputs.conf using the Splunk Add-on for Microsoft Windows

Extracting fields using props.conf and transforms.conf

Creating event types and tagging

Summary

Part 2: Visualizing Data with Splunk

Chapter 4: Introduction to SPL

Understanding the Splunk search interface

Dissecting a Splunk query

Formatting and transforming data

Summary

Chapter 5: Reporting Commands, Lookups, and Macros

Exploring more Splunk commands

Enhancing logs with lookups

Simplifying Splunk searches with macros

Summary

Chapter 6: Creating Tables and Charts Using SPL

Creating and formatting tables

Creating and formatting charts

Creating advanced charts

Summary

Chapter 7: Creating Dynamic Dashboards

Adding tables and charts to dashboards

Adding inputs, tokens, and drilldowns

Exploring the dashboard source

Adding reports and drilldowns to dashboards

Experimenting with the new Dashboard Studio

Summary

Part 3: Advanced Topics in Splunk

Chapter 8: Licensing, Indexing, and Buckets

Understanding Splunk indexing and buckets

Exploring Splunk queues

Discussing Splunk licensing models

Summary

Chapter 9: Clustering and Advanced Administration

Introducing Splunk clusters

Understanding search head clusters

Understanding indexer clusters

Summary

Chapter 10: Data Models, Acceleration, and Other Ways to Improve Performance

Understanding data models

Accelerating data models

Improving performance

Summary

Chapter 11: Multisite Splunk Deployments and Federated Search

Exploring multisite Splunk deployments

Configuring federated search

Using federated search

Summary

Chapter 12: Container Management

Understanding container management

Deploying Splunk in Docker

Getting started with Splunk Operator for Kubernetes

Exploring container logs using Splunk

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 (1)

5 star

100%

4 star

3 star

2 star

1 star

Splunking big data

Splunk is a big data tool. In this book, we will introduce the idea of using Splunk to solve problems that involve large amounts of data. When I worked on the IT security team, the problem was obvious – we needed to use security data to identify malicious activity. Defining the problem you are trying to solve will determine what kind of data you collect and how you analyze that data. Not every problem requires a big data solution. Sometimes, a traditional database solution might work just as well and with less cost. So, how do you know if you’re dealing with a big data problem? There are three V’s that help define big data:

High Volume: A big data problem usually involves large volumes of data. Most times, the amount of data is greater than what can fit into traditional database solutions.
High Velocity: Traditional database solutions are usually not able to handle the speed at which modern data enters a system. Imagine trying to store and manage data from user clicks on a website such as amazon.com in a traditional database. Databases are not designed to support that many operations.
High Variety: A problem requiring analysis of big data involves a variety of data sources of varying formats. An IT security SIEM may have data being logged from multiple data sources, including firewall devices, email traces, DNS logs, and access logs. Each of these logs has a different format and correlating all the logs requires a heavy-duty system.

Here are some cases that can be solved using big data:

A retail company wants to determine how product placement in stores affects sales. For example, research may show that placing packs of Cheetos near the Point Of Sale (POS) devices increases sales for customers with small children. The target assigns a guest ID number to every customer. They correlate this ID number with the customer’s credit card number and transactions.
A rental company wants to measure the times of year that are busiest to ensure that there is a sufficient inventory of vehicles at different locations. Even so, they may realize that a certain type of vehicle is more suitable for a particular area of town.
A public school district wants to explore data pulled from multiple district schools to determine the effect of remote classes on certain demographics.
An online shop wants to use customer traffic to determine the peak time for posting ads or giving discounts.
An IT security team may use datasets containing firewall logs, DNS logs, and user access to hunt down a malicious actor on the network.

Now, let’s look at how big data is generated.

How is big data generated?

Infographics published by FinancesOnline (https://financesonline.com) indicated that humans created, captured, copied, and consumed about 74 zettabytes of data in 2021. That number is estimated to grow to 149 zettabytes in 2024.

The volume of data seen in the last few years can be attributed to increases in three types of data:

Machine data: Data generated by machines such as operating systems and application logs
Social data: Data generated by social media systems
Transactional data: Data generated by e-commerce systems

We are surrounded by digital devices, and as the capacity and capabilities of these devices increase, the amount of data generated also increases. Modern devices such as phones, laptops, watches, smart speakers, cars, sensors, POS devices, and household appliances all generate large volumes of machine data in a wide variety of formats. Many times, this data stays untouched because the data owners do not have the ability, time, or money to analyze it.

The prevalence of smartphones is possibly another contributor to the exponential increase in data. IBM’s Simon Personal Communicator, the first mainstream mobile telephone introduced in 1992, had very limited capability. It cost a whopping $899 with a service contract. Out of the box, a user could use the Simon to make calls and send and receive emails, faxes, and pages. It also contained a notebook, address book, calendar, world clock, and scheduler features. IBM sold approximately 50,000 units (https://time.com/3137005/first-smartphone-ibm-simon/).

Figure 1.1 shows the first smartphone to have the functions of a phone and a Personal Digital Assistant (PDA):

Figure 1.1 – The IBM Simon Personal Communicator released in 1992

The IBM Simon Personal Communicator is archaic compared to the average cellphone today. Apple sold 230 million iPhones in 2020 (https://www.businessofapps.com/data/apple-statistics/). iPhone users generate data when they browse the web, listen to music and podcasts, stream television and movies, conduct business transactions, and post to and browse social media feeds. This is in addition to the features that were found in the IBM Simon, such as sending and receiving emails. Each of these applications generates volumes of data. Just one application such as Facebook running on an iPhone involves a variety of data – posts, photos, videos, transactions from Facebook Marketplace, and so much more. Figure 1.2 shows data from OurWorldData.org (https://ourworldindata.org/internet) that illustrates the rapid increase in users of social media:

Figure 1.2 – Number of people using social media platforms, 2005 to 2019

In the next section, we’ll explore how we can use Splunk to process all this data.

Understanding Splunk

Now that we understand what big data is, its applications, and how it is generated, let’s talk about Splunk Enterprise and how Splunk can be used to manage big data. For simplicity, we will refer to Splunk Enterprise as Splunk.

Splunk was founded in 2003 by Michael Baum, Rob Das, and Erik Swan. Splunk was designed to search, monitor, and analyze machine-generated data. Splunk can handle high volume, high variety data being generated at high velocity. This makes it a perfect tool for dealing with big data. Splunk works on various platforms, including Windows (32- and 64-bit), Linux (64-bit), and macOS. Splunk can be installed on physical devices, virtual machines such as VirtualBox and VMWare, and virtual cloud instances such as Amazon Web Services (AWS) and Microsoft Azure. Customers can also sign up for the Splunk Cloud Platform, which supplies the user with a Splunk deployment hosted virtually. Using AWS instances and Splunk Cloud frees the user from having to deploy and maintain physical servers. There is a free version 60-day trial of Splunk that allows the user to index 500 MB of data daily. Once the user has used the product for 60 days, they can use a perpetual free license or purchase a Splunk license. The 60-day version of Splunk is a great way to get your feet wet. Traditionally, the paid version of Splunk was billed at a volume rate – that is, the more data you index, the more you pay. However, new pricing models such as workload and ingest pricing have been introduced in recent years.

In addition to the core Splunk tool, there are various free and paid applications, such as Splunk Enterprise Security, Splunk Soar, and various observability solutions such as Splunk User Behavior Analytics (UBA) and Splunk Observability Cloud.

Splunk was designed to index a variety of data. This is accomplished via pre-defined configurations that allow Splunk to recognize the format of different data sources. In addition, splunkbase.com is a constantly growing repository of 1,000+ apps and Technical Add-Ons (TAs) developed by Splunk, Splunk partners, and the Splunk community. One of the most important features of these TAs includes configurations for automatically extracting fields from raw data. Unlike traditional databases, Splunk can index large volumes of data. A dedicated Splunk Enterprise indexer can index over 20 MB of data per second or 1.7 per day. The amount of data that Splunk is capable of indexing can be increased with additional indexers. There are many use cases for which Splunk is a great solution.

Table 1.1 highlights how Splunk improved processes at The University of Arizona, Honda, and Lenovo:

Use Case	Company	Details
Security	The University of Arizona	The University of Arizona used Splunk Remote Work Insights (RWI) to help with the challenges of remote learning during the pandemic (https://www.splunk.com/en_us/customers/success-stories/university-of-arizona.html)
IT Operations	Honda	Honda used predictive analytics to increase efficiency and solve problems before they became machine failures or interruptions in their production line (https://tinyurl.com/5n7f7naz)
DevOps	Lenovo	Lenovo reduced the amount of time spent in troubleshooting by 50% and maintained 100% uptime despite a 300% increase in web traffic (https://tinyurl.com/yactu398)

Table 1.1 – Examples of success stories from Splunk customers

We will look at some of the major components of Splunk in the next section.

Data Analytics Using Splunk 9.x

By : Dr. Nadine Shillingford

Data Analytics Using Splunk 9.x

By: Dr. Nadine Shillingford

Overview of this book

Related Content you might be interested in

Current Title:

Data Analytics Using Splunk 9.x

Splunk 7.x Quick Start Guide

Splunk 9.x Enterprise Certified Admin Guide

Mastering Splunk 8

Splunking big data

How is big data generated?

Understanding Splunk