Book Image

Implementing Splunk 7, Third Edition - Third Edition

Book Image

Implementing Splunk 7, Third Edition - Third Edition

Overview of this book

Splunk is the leading platform that fosters an efficient methodology and delivers ways to search, monitor, and analyze growing amounts of big data. This book will allow you to implement new services and utilize them to quickly and efficiently process machine-generated big data. We introduce you to all the new features, improvements, and offerings of Splunk 7. We cover the new modules of Splunk: Splunk Cloud and the Machine Learning Toolkit to ease data usage. Furthermore, you will learn to use search terms effectively with Boolean and grouping operators. You will learn not only how to modify your search to make your searches fast but also how to use wildcards efficiently. Later you will learn how to use stats to aggregate values, a chart to turn data, and a time chart to show values over time; you'll also work with fields and chart enhancements and learn how to create a data model with faster data model acceleration. Once this is done, you will learn about XML Dashboards, working with apps, building advanced dashboards, configuring and extending Splunk, advanced deployments, and more. Finally, we teach you how to use the Machine Learning Toolkit and best practices and tips to help you implement Splunk services effectively and efficiently. By the end of this book, you will have learned about the Splunk software as a whole and implemented Splunk services in your tasks at projects
Table of Contents (19 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

Calculating top for a large time frame


One common problem is to find the top contributors out of a huge set of unique values. For instance, if you want to know what IP addresses are using the most bandwidth in a given day or week, you may have to keep a track of the total of request sizes across millions of unique hosts to definitively answer this question. When using summary indexes, this means storing millions of events in the summary index, quickly defeating the purpose of summary indexes.

Just to illustrate, let's look at a simple set of data:

Time 1.1.1.1 2.2.2.2 3.3.3.3 4.4.4.4 5.5.5.5 6.6.6.6 
12:00 99 100 100 100 
13:00 99 100 100 100 
14:00 99 100 101 100 
15:00 99 99 100 100 
16:00 99 100 100 100 
total 495 300 299 401 400 100 

If we only stored the top three IPs per hour, our dataset would look like the following:

Time 1.1.1.1 2.2.2.2 3.3.3.3 4.4.4.4 5.5.5.5 6.6.6.6 
12:00 100 100 100 
13:00 100 100 100 
14:00 100 101 100 
15:00 99 100 100 
16:00 100 100 100 
total 300 299 401 400...