Book Image

Improving Your Splunk Skills

By : James D. Miller, Paul R. Johnson, Josh Diakun, Derek Mock
Book Image

Improving Your Splunk Skills

By: James D. Miller, Paul R. Johnson, Josh Diakun, Derek Mock

Overview of this book

Splunk makes it easy for you to take control of your data and drive your business with the cutting edge of operational intelligence and business analytics. Through this Learning Path, you'll implement new services and utilize them to quickly and efficiently process machine-generated big data. You'll begin with an introduction to the new features, improvements, and offerings of Splunk 7. You'll learn to efficiently use wildcards and modify your search to make it faster. You'll learn how to enhance your applications by using XML dashboards and configuring and extending Splunk. You'll also find step-by-step demonstrations that'll walk you through building an operational intelligence application. As you progress, you'll explore data models and pivots to extend your intelligence capabilities. By the end of this Learning Path, you'll have the skills and confidence to implement various Splunk services in your projects. This Learning Path includes content from the following Packt products: Implementing Splunk 7 - Third Edition by James Miller Splunk Operational Intelligence Cookbook - Third Edition by Paul R Johnson, Josh Diakun, et al
Table of Contents (21 chapters)
Title Page

Calculating top for a large time frame

One common problem is to find the top contributors out of a huge set of unique values. For instance, if you want to know what IP addresses are using the most bandwidth in a given day or week, you may have to keep a track of the total of request sizes across millions of unique hosts to definitively answer this question. When using summary indexes, this means storing millions of events in the summary index, quickly defeating the purpose of summary indexes.

Just to illustrate, let's look at a simple set of data:

Time 1.1.1.1 2.2.2.2 3.3.3.3 4.4.4.4 5.5.5.5 6.6.6.6 
12:00 99 100 100 100 
13:00 99 100 100 100 
14:00 99 100 101 100 
15:00 99 99 100 100 
16:00 99 100 100 100 
total 495 300 299 401 400 100 

If we only stored the top three IPs per hour, our dataset would look like the following:

Time 1.1.1.1 2.2.2.2 3.3.3.3 4.4.4.4 5.5.5.5 6.6...