Book Image

Implementing Splunk (Update)

Book Image

Implementing Splunk (Update)

Overview of this book

Table of Contents (20 chapters)
Implementing Splunk Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Calculating top for a large time frame


One common problem is to find the top contributors out of some huge set of unique values. For instance, if you want to know what IP addresses are using the most bandwidth in a given day or week, you may have to keep track of the total of request sizes across millions of unique hosts to definitively answer this question. When using summary indexes, this means storing millions of events in the summary index, quickly defeating the point of summary indexes.

Just to illustrate, let's look at a simple set of data:

Time 1.1.1.1 2.2.2.2 3.3.3.3 4.4.4.4 5.5.5.5 6.6.6.6
12:00 99 100 100 100
13:00 99 100 100 100
14:00 99 100 101 100
15:00 99 99 100 100
16:00 99 100 100 100
total 495 300 299 401 400 100

If we only stored the top three IPs per hour, our data set would look like the following:

Time 1.1.1.1 2.2.2.2 3.3.3.3 4.4.4.4 5.5.5.5 6.6.6.6
12:00 100 100 100
13:00 100 100 100
14:00 100 101 100
15:00 99 100 100
16:00 100 100 100
total 300 299 401 400 100

According...