Book Image

Splunk Best Practices

Book Image

Splunk Best Practices

Overview of this book

This book will give you an edge over others through insights that will help you in day-to-day instances. When you're working with data from various sources in Splunk and performing analysis on this data, it can be a bit tricky. With this book, you will learn the best practices of working with Splunk. You'll learn about tools and techniques that will ease your life with Splunk, and will ultimately save you time. In some cases, it will adjust your thinking of what Splunk is, and what it can and cannot do. To start with, you'll get to know the best practices to get data into Splunk, analyze data, and package apps for distribution. Next, you'll discover the best practices in logging, operations, knowledge management, searching, and reporting. To finish off, we will teach you how to troubleshoot Splunk searches, as well as deployment, testing, and development with Splunk.
Table of Contents (16 chapters)

Chapter 1. Application Logging

Within the working world of technology, there are hundreds of thousands of different applications, all (usually) logging in different formats. As Splunk experts, our job is make all those logs speak human, which is often an impossible task. With third-party applications that provide support, sometimes log formatting is out of our control. Take for instance, Cisco or Juniper, or any other leading application manufacturer. We won't be discussing these kinds of logs in this chapter, but we'll discuss the logs that we do have some control over.

The logs I am referencing belong to proprietary in-house (also known as "home grown") applications that are often part of middleware, and usually they control some of the most mission-critical services an organization can provide.

Proprietary applications can be written in any language. However, logging is usually left up to the developers for troubleshooting and up until now the process of manually scraping log files to troubleshoot quality assurance issues and system outages has been very specific. I mean that usually, the developer(s) are the only people that truly understand what those log messages mean.

That being said, oftentimes developers write their logs in a way that they can understand them, because ultimately it will be them doing the troubleshooting/code fixing when something breaks severely.

As an IT community, we haven't really started looking at the way we log things, but instead we have tried to limit the confusion to developers, and then have them help other SME's that provide operational support to understand what is actually happening.

This method is successful, however, it is slow, and the true value of any SME is reducing any system's MTTR, and increasing uptime.

With any system, the more transactions processed means the larger the scale of the system, which means that, after about 20 machines, troubleshooting begins to get more complex and time consuming with a manual process.

This is where something like Splunk can be extremely valuable. However, Splunk is only as good as the information that comes into it.

I will say this phrase for the people who haven't heard it yet; "garbage in... garbage out"

There are some ways to turn proprietary logging into a powerful tool, and I have personally seen the value of these kinds of logs. After formatting them for Splunk, they turn into a huge asset in an organization's software life cycle.

I'm not here to tell you this is easy, but I am here to give you some good practices about how to format proprietary logs.

To do that I'll start by helping you appreciate a very silent and critical piece of the application stack.

Note

To developers, a logging mechanism is a very important part of the stack, and the log itself is mission critical. What we haven't spent much time thinking about before log analyzers, is how to make log events/messages/exceptions more machine friendly so that we can socialize the information in a system like Splunk, and start to bridge the knowledge gap between development and operations.

The nicer we format the logs, the faster Splunk can reveal the information about our systems, saving everyone time and headaches.

In this chapter we are briefly going to look at the following topics:

  • Log messengers

  • Logging formats

  • Correlation IDs and why they help

  • When to place correlation ID in a log