Book Image

Splunk: Enterprise Operational Intelligence Delivered

By : Derek Mock, Betsy Page Sigman, Paul R. Johnson, Erickson Delgado, Josh Diakun, Ashish Kumar Tulsiram Yadav
Book Image

Splunk: Enterprise Operational Intelligence Delivered

By: Derek Mock, Betsy Page Sigman, Paul R. Johnson, Erickson Delgado, Josh Diakun, Ashish Kumar Tulsiram Yadav

Overview of this book

Splunk is an extremely powerful tool for searching, exploring, and visualizing data of all types. Splunk is becoming increasingly popular, as more and more businesses, both large and small, discover its ease and usefulness. Analysts, managers, students, and others can quickly learn how to use the data from their systems, networks, web traffic, and social media to make attractive and informative reports. This course will teach everything right from installing and configuring Splunk. The first module is for anyone who wants to manage data with Splunk. You’ll start with very basics of Splunk— installing Splunk— before then moving on to searching machine data with Splunk. You will gather data from different sources, isolate them by indexes, classify them into source types, and tag them with the essential fields. With more than 70 recipes on hand in the second module that demonstrate all of Splunk’s features, not only will you find quick solutions to common problems, but you’ll also learn a wide range of strategies and uncover new ideas that will make you rethink what operational intelligence means to you and your organization. Dive deep into Splunk to find the most efficient solution to your data problems in the third module. Create the robust Splunk solutions you need to make informed decisions in big data machine analytics. From visualizations to enterprise integration, this well-organized high level guide has everything you need for Splunk mastery. This learning path combines some of the best that Packt has to offer into one complete, curated package. It includes content from the following Packt products: • Splunk Essentials - Second Edition • Splunk Operational Intelligence Cookbook - Second Edition • Advanced Splunk
Table of Contents (6 chapters)

Chapter 9. Best Practices and Advanced Queries

As we bring this book to a close, we want to leave you with a few extra skills in your Splunk toolkit. Throughout the book, you have gained the essential skills required to use Splunk effectively. In this chapter, we will look at some best practices that you can incorporate in your daily Splunk work. These include the following:

  • Temporary indexes and oneshot indexing
  • Searching within an index
  • Searching within a limited time frame
  • How to do quick searches via fast mode
  • How to use event sampling
  • Using the universal forwarder

We will also list some advanced SPL queries that you can use as templates when the need arises. These include:

  • Doing a subsearch, or a search within a search
  • Using append and join
  • Using eval with if
  • Using eval with match

Throughout this book, we have seen how logs can be used to improve applications and to troubleshoot problems. Since logs are such an important component of using data with Splunk, we end the chapter with a few basics, recommended by Splunk, that should be remembered when you are creating logs. These are:

  • Include clear key-value pairs
  • Create events that are understandable to human readers
  • Remember to use timestamps for all events
  • Be sure your identifiers are unique
  • Log using text format, not binary
  • Use formats that developers can easily use
  • Log what you think might be useful at some point
  • Create use categories with meaning
  • Include the source of the log event
  • Minimize the number of multi-line events

Temporary indexes and oneshot indexing

When you need to index new data and you are unfamiliar with its format, it is always best practice to use a temporary index. You should begin by creating a temporary index just for this purpose. Once you have this temporary index, you can use a Splunk command to add the file once. This process is called  oneshot indexing. This is crucial when you know you have to transform the data prior to indexing, for instance when using props.conf and transforms.conf. A nice feature of oneshot indexing is that there is no need for any kind of configuration before uploading.

Here is how you perform oneshot indexing using the CLI:

C:\> c:\splunk\bin\splunk add oneshot TestFile.log -index TempIndex - 
     sourcetype TempSourceType

You can also do this from the UI by going to Settings | Data inputs | Files and Directories | Add new. Then browse for the file and click on Index Once.

These methods will only work when Splunk is stopped. It will warn you if it is not. Once again, it is crucial that you indicate the specific index using the -index parameter. Without it, it will clean all indexes.

For example, to clean the index we created previously, we use the command you can see highlighted here:

C:\> c:\splunk\bin\splunk clean eventdata -index TempIndex

Searching within an index

Always remember to filter your searches by index. By doing so, you can dramatically speed up your searches. If you don't restrict your search to a specific index, it means Splunk has to go through all available indexes and execute the search against them, thus consuming unnecessary time.

When designing your Splunk implementation, partitioning of indexes is also very crucial. Careful thought needs to be taken when planning for the indexes and their partitioning. In my experience, it is best to create an index for every type of source included in your incoming data.

For example, all web server logs for the same application should be placed in one index. You may then split the log types by source type, but keep them within the same index. This will give you a generally favorable search speed even if you have to search between two different source types.

Here are some examples:

Index name

Source type

App1

Logs.Error

App1

Logs.Info

App1

Logs.Warning

App2

Logs.Error

App2

Logs.Info

App3

Logs.Warning

As you can see, we have indexed by app number first, then created various subtypes. You may then create a search query within the same index, even if you have to combine two source types:

  • A good query will be as follows:
SPL> index=App1 sourcetype=Logs.Error OR Logs.Warning
  • A bad query will be as follows:
SPL> sourcetype=Logs.* Error

The way we have set it up here, if you ever have to retrieve data from both indexes, then you can combine them with the following query. It is not as efficient as searching against a single index, but it is better than going through all other available indexes:

SPL> index=App1 OR index=App2 sourcetype=Logs.Error

Search within a limited time frame

By default, the Search and Reporting app's time range is set to All Time. Searches done using this time frame will have a negative performance impact on your Splunk instance. This is heightened when there are concurrent users doing the same thing. Although you can train your users to always select a limited time range, not everybody will remember to do this.

The solution for this problem is fairly simple. You can simply change the default time range for the drop-down menu. We will do this by modifying the ui-prefs.conf file in an administrative command prompt.

Go ahead and execute the following command:

C:\> notepad c:\Splunk\etc\system\local\ui-prefs.conf

Copy and paste the following into the file:

[search] 
dispatch.earliest_time = -4h 
dispatch.latest_time = now 
          
[default] 
dispatch.earliest_time = -4h 
dispatch.latest_time = now 

Save the file and restart Splunk. Go back to the Search and Reporting app and the default time range should now say Last 4 hours. Note that this will also change the default time range in the Search dashboard of the Destinations app, since any change in the default will be automatically applied to all apps, unless specified otherwise.

This is a good way to ensure that your users will not accidentally run search queries against all existing data without a time frame.

Quick searches via fast mode

There are three types of searching available for Splunk: Fast Mode, Smart Mode, and Verbose Mode:

Quick searches via fast mode

If you want your searches to be faster, use Fast Mode. Fast mode will not attempt to generate fields during search time, unlike the default smart mode. This is very good to use when you do not know what you are looking for. Smart Mode looks for transforming commands in your searches. If it finds these, it acts like fast mode; if it doesn't, then it acts like verbose mode. Verbose Mode means that the search will provide as much information as possible, even though this may result in significantly slower searches.

Using event sampling

New to version 6.4 is event sampling. Like the fact that you only need a drop of blood to test for the amount of sugar and sodium levels in your blood, you often only need a small amount of data from your dataset to make conclusions about that dataset. The addition of event sampling to the Splunk toolset is particularly useful, because there is often so much data available and what you are really seeking is to take measurements from that data quickly:

Using event sampling

Event sampling uses a sample ratio value that reduces the number of results. If a typical search result returns 1,000 events, a 1:10 event sampling ratio will return 100 events. As you can see from the previous screenshot, these ratios can significantly cut the amount of data indexed, and can range from a fairly large ratio (which can be set using the Custom setting) to one as small as 1:100,000 (or even smaller, again using the Custom setting).

This is not suitable for saved searches for which you need accurate counts. This is, however, perfect when you are testing your search queries as they will return significantly faster. Most of the time you will spend in Splunk is taken up with trying and retrying queries using SPL. If you have to deal with a large amount of data all the time, then your work time will be slow. Remember to use event sampling, and you will reduce the time it takes to create useful searches.

The following steps indicate the steps you should take in this process:

  • Do a quick search to ensure that the correct data is present
  • Look over the characteristics of the events and determine how you want to analyze them
  • Set your event sampling for the level you find useful and efficient for this stage in the process
  • Test your search commands against the resulting subset of data
  • Keep going through this process until you have a search that you are happy with

When you are done, make sure to reset event sampling to No Event Sampling before saving your search query to a dashboard, otherwise the previous setting will be included in the dashboard panel and will indicate results that do not show a complete picture of the entire dataset.

Splunk Universal Forwarders

Although detailed descriptions of Splunk Universal Forwarders will not be part of this book, it is good to mention that on large-scale Splunk implementations, data gathering should, as much as possible, be done using these. Their usefulness lies in the fact that they are lightweight applications that can run on many different operating systems and can quickly and easily forward data to the Splunk indexer.

Throughout this book, we have indexed files locally on your machine. In production environments, with many different types of deployment and using many different machines, each machine where data resides will have a Universal Forwarder.

When the implementation is large and includes many different machines, Universal Forwarders can be managed using the forwarder manager.

These forwarders and the ability to manage them easily are one of the reasons for Splunk's growing popularity. Sizeable organizations find it much easier to be able to bring in, understand, and use their data for decision-making when they can use the capabilities of Splunk's Universal Forwarders. It is useful to note that the adjective Universal represents the fact that Splunk can be used with almost any type of data imaginable, thus multiplying the usefulness of these Universal Forwarders.

Advanced queries

There are various kinds of advanced query that you may want to consider as you plan out how you will create searches and dashboards for your data. Consider the ones that we present, for they will help you design queries that are more efficient and cost effective.

Subsearch

A subsearch is a search within a search. If your main search requires data as a result of another search, then you can use Splunk's subsearch capability to achieve it. Say you want to find statistics about the server that generates the most 500 errors. A 500 error is a general HTTP status code that means that something has gone wrong with the server, but it doesn't know specifically at this particular time what is wrong. Obviously, if you are responsible for running a website, you want to pay close attention to 500 errors and where they are coming from. You can achieve your goal of finding the culprit server with two searches.

The first search, shown next, will return the server address with the most 500 errors. Note you are setting the limit to 1 and giving the instructions (using the + sign) to just include the server_ip field:

SPL> index=main http_status_code=500 | top limit=1 server_ip 
     | fields + server_ip

The result of this code is as follows:

10.2.1.34

In the second search, you can then filter your desired query with the server_ip value from the first result and ask for the top values of the http_uri and client_ip fields, as shown in the subsearch highlighted in the following code. In the subsearch, you are simply asking for the top http_uri and client_ip fields for data that has been piped through to that point, or the data from the indicated server with the top number of 500 codes. This information will be very useful to you as you try to pinpoint the exact problems with the web server.

In the second search, you can then filter your desired query with the server_ip value from the first result:

SPL> index=main server_ip=10.2.1.34 | top http_uri, client_ip

You can combine these two searches into one using a subsearch. Also note that a subsearch appears within brackets. It is important to understand that a subsearch is processed before the search is actually carried out or parsed:

SPL> index=main [ search index=main http_status_code=500 
     | top limit=1 server_ip
     | fields + server_ip ] | top http_uri, client_ip

A subsearch can also be useful if you want to search on data that depends on what you have found in another dataset. For example, consider a case where you have two or more indexes for various application logs. You can set up a search of these logs that will let you know what shared field has a value that is not in another index. An example of how you can do this is shown here:

SPL> sourcetype=a_sourcetype NOT [search sourcetype=b_sourcetype 
     | fields field_val]

The default number of results is set to 100. This is because a subsearch with a large number of results will tend to slow down on performance.

Using append

Once you have done a subsearch, you may want to add the results of that subsearch to another set of results. If that is the case, and you are using historical data, use the syntax provided here to append the subsearch:

SPL> . . | append [subsearch]

You can also specify various timing options if you like.

Using join

You can also use the join command to join the results of the subsearch to your main search results, but you will likely often opt to use append instead, if you have historical data. Again, the basic syntax is simple:

SPL> . . | join [subsearch]

This will default to an inner join, which includes only events shared in common by the two searches. You can also specify an outer or left join. The outer join contains all the data, whereas the left join contains the data from events fulfilling the left search, as well as the events that are shared in common. You can also specify a field list for the join, instead of including all fields by default.

Using eval and if

If you need to create a field based on the data present in an event, you can use the eval command to create a field variable and use if to check for that condition.

The if function takes the form of:

SPL> | eval value=if(condition, field1, field2)

Say you want to create two additional fields during search time to determine whether a destination is in the East Coast or not. Using the code presented next, if a destination URI has NY, MIA, or MCO in it, a new field called East will be added to each of those events. Otherwise, Splunk will add a new field called Others. Once that has been done, this code will list the newly-created Region field and http_uri for all events, and will sort by Region:

SPL> index=main http_uri="/destination/*/details" 
 | eval Region=if(match(http_uri, "NY|MIA|MCO"), "East", "Others") 
 | top 0 Region, http_uri | sort Region

A little regular expression has been used here to do a case statement between the airport codes: NY|MIA|MCO. If the http_uri includes NY, MIA, or MCO, then its Region field value will be East; otherwise it will be Others.

This should now return the data with the new fields:

Using eval and if

Using eval and match with a case function

You can optimize this query by using match instead of if and account for West and Central.

We also introduce the case function here. In the following illustration, you will see that we can set the value of a field by giving it a value of Label1 if Condition1 is true, Label2 if Condition2 is true, and so on:

SPL>| eval Value=case(Condition1, "Label1", Condition2, "Label2", 
          ConditionX, "LabelX")

Let us tweak the previous query to use case instead of if:

SPL> index=main http_uri="/destination/*/details" 
     | eval Region=case(match(http_uri, "NY|MIA|MCO"), 
           "East",  match(http_uri, "WAS|AK|LAX|PML"), "West", 
           match(http_uri, "HOU"), "Central") 
     | top 0 Region, http_uri | sort Region

The result will now properly classify the destinations based on the region:

Using eval and match with a case function

Subsearch

A subsearch is a search within a search. If your main search requires data as a result of another search, then you can use Splunk's subsearch capability to achieve it. Say you want to find statistics about the server that generates the most 500 errors. A 500 error is a general HTTP status code that means that something has gone wrong with the server, but it doesn't know specifically at this particular time what is wrong. Obviously, if you are responsible for running a website, you want to pay close attention to 500 errors and where they are coming from. You can achieve your goal of finding the culprit server with two searches.

The first search, shown next, will return the server address with the most 500 errors. Note you are setting the limit to 1 and giving the instructions (using the + sign) to just include the server_ip field:

SPL> index=main http_status_code=500 | top limit=1 server_ip 
     | fields + server_ip

The result of this code is as follows:

10.2.1.34

In the second search, you can then filter your desired query with the server_ip value from the first result and ask for the top values of the http_uri and client_ip fields, as shown in the subsearch highlighted in the following code. In the subsearch, you are simply asking for the top http_uri and client_ip fields for data that has been piped through to that point, or the data from the indicated server with the top number of 500 codes. This information will be very useful to you as you try to pinpoint the exact problems with the web server.

In the second search, you can then filter your desired query with the server_ip value from the first result:

SPL> index=main server_ip=10.2.1.34 | top http_uri, client_ip

You can combine these two searches into one using a subsearch. Also note that a subsearch appears within brackets. It is important to understand that a subsearch is processed before the search is actually carried out or parsed:

SPL> index=main [ search index=main http_status_code=500 
     | top limit=1 server_ip
     | fields + server_ip ] | top http_uri, client_ip

A subsearch can also be useful if you want to search on data that depends on what you have found in another dataset. For example, consider a case where you have two or more indexes for various application logs. You can set up a search of these logs that will let you know what shared field has a value that is not in another index. An example of how you can do this is shown here:

SPL> sourcetype=a_sourcetype NOT [search sourcetype=b_sourcetype 
     | fields field_val]

The default number of results is set to 100. This is because a subsearch with a large number of results will tend to slow down on performance.

Using append

Once you have done a subsearch, you may want to add the results of that subsearch to another set of results. If that is the case, and you are using historical data, use the syntax provided here to append the subsearch:

SPL> . . | append [subsearch]

You can also specify various timing options if you like.

Using join

You can also use the join command to join the results of the subsearch to your main search results, but you will likely often opt to use append instead, if you have historical data. Again, the basic syntax is simple:

SPL> . . | join [subsearch]

This will default to an inner join, which includes only events shared in common by the two searches. You can also specify an outer or left join. The outer join contains all the data, whereas the left join contains the data from events fulfilling the left search, as well as the events that are shared in common. You can also specify a field list for the join, instead of including all fields by default.

Using eval and if

If you need to create a field based on the data present in an event, you can use the eval command to create a field variable and use if to check for that condition.

The if function takes the form of:

SPL> | eval value=if(condition, field1, field2)

Say you want to create two additional fields during search time to determine whether a destination is in the East Coast or not. Using the code presented next, if a destination URI has NY, MIA, or MCO in it, a new field called East will be added to each of those events. Otherwise, Splunk will add a new field called Others. Once that has been done, this code will list the newly-created Region field and http_uri for all events, and will sort by Region:

SPL> index=main http_uri="/destination/*/details" 
 | eval Region=if(match(http_uri, "NY|MIA|MCO"), "East", "Others") 
 | top 0 Region, http_uri | sort Region

A little regular expression has been used here to do a case statement between the airport codes: NY|MIA|MCO. If the http_uri includes NY, MIA, or MCO, then its Region field value will be East; otherwise it will be Others.

This should now return the data with the new fields:

Using eval and if

Using eval and match with a case function

You can optimize this query by using match instead of if and account for West and Central.

We also introduce the case function here. In the following illustration, you will see that we can set the value of a field by giving it a value of Label1 if Condition1 is true, Label2 if Condition2 is true, and so on:

SPL>| eval Value=case(Condition1, "Label1", Condition2, "Label2", 
          ConditionX, "LabelX")

Let us tweak the previous query to use case instead of if:

SPL> index=main http_uri="/destination/*/details" 
     | eval Region=case(match(http_uri, "NY|MIA|MCO"), 
           "East",  match(http_uri, "WAS|AK|LAX|PML"), "West", 
           match(http_uri, "HOU"), "Central") 
     | top 0 Region, http_uri | sort Region

The result will now properly classify the destinations based on the region:

Using eval and match with a case function

Using append

Once you have done a subsearch, you may want to add the results of that subsearch to another set of results. If that is the case, and you are using historical data, use the syntax provided here to append the subsearch:

SPL> . . | append [subsearch]

You can also specify various timing options if you like.

Using join

You can also use the join command to join the results of the subsearch to your main search results, but you will likely often opt to use append instead, if you have historical data. Again, the basic syntax is simple:

SPL> . . | join [subsearch]

This will default to an inner join, which includes only events shared in common by the two searches. You can also specify an outer or left join. The outer join contains all the data, whereas the left join contains the data from events fulfilling the left search, as well as the events that are shared in common. You can also specify a field list for the join, instead of including all fields by default.

Using eval and if

If you need to create a field based on the data present in an event, you can use the eval command to create a field variable and use if to check for that condition.

The if function takes the form of:

SPL> | eval value=if(condition, field1, field2)

Say you want to create two additional fields during search time to determine whether a destination is in the East Coast or not. Using the code presented next, if a destination URI has NY, MIA, or MCO in it, a new field called East will be added to each of those events. Otherwise, Splunk will add a new field called Others. Once that has been done, this code will list the newly-created Region field and http_uri for all events, and will sort by Region:

SPL> index=main http_uri="/destination/*/details" 
 | eval Region=if(match(http_uri, "NY|MIA|MCO"), "East", "Others") 
 | top 0 Region, http_uri | sort Region

A little regular expression has been used here to do a case statement between the airport codes: NY|MIA|MCO. If the http_uri includes NY, MIA, or MCO, then its Region field value will be East; otherwise it will be Others.

This should now return the data with the new fields:

Using eval and if

Using eval and match with a case function

You can optimize this query by using match instead of if and account for West and Central.

We also introduce the case function here. In the following illustration, you will see that we can set the value of a field by giving it a value of Label1 if Condition1 is true, Label2 if Condition2 is true, and so on:

SPL>| eval Value=case(Condition1, "Label1", Condition2, "Label2", 
          ConditionX, "LabelX")

Let us tweak the previous query to use case instead of if:

SPL> index=main http_uri="/destination/*/details" 
     | eval Region=case(match(http_uri, "NY|MIA|MCO"), 
           "East",  match(http_uri, "WAS|AK|LAX|PML"), "West", 
           match(http_uri, "HOU"), "Central") 
     | top 0 Region, http_uri | sort Region

The result will now properly classify the destinations based on the region:

Using eval and match with a case function

Using join

You can also use the join command to join the results of the subsearch to your main search results, but you will likely often opt to use append instead, if you have historical data. Again, the basic syntax is simple:

SPL> . . | join [subsearch]

This will default to an inner join, which includes only events shared in common by the two searches. You can also specify an outer or left join. The outer join contains all the data, whereas the left join contains the data from events fulfilling the left search, as well as the events that are shared in common. You can also specify a field list for the join, instead of including all fields by default.

Using eval and if

If you need to create a field based on the data present in an event, you can use the eval command to create a field variable and use if to check for that condition.

The if function takes the form of:

SPL> | eval value=if(condition, field1, field2)

Say you want to create two additional fields during search time to determine whether a destination is in the East Coast or not. Using the code presented next, if a destination URI has NY, MIA, or MCO in it, a new field called East will be added to each of those events. Otherwise, Splunk will add a new field called Others. Once that has been done, this code will list the newly-created Region field and http_uri for all events, and will sort by Region:

SPL> index=main http_uri="/destination/*/details" 
 | eval Region=if(match(http_uri, "NY|MIA|MCO"), "East", "Others") 
 | top 0 Region, http_uri | sort Region

A little regular expression has been used here to do a case statement between the airport codes: NY|MIA|MCO. If the http_uri includes NY, MIA, or MCO, then its Region field value will be East; otherwise it will be Others.

This should now return the data with the new fields:

Using eval and if

Using eval and match with a case function

You can optimize this query by using match instead of if and account for West and Central.

We also introduce the case function here. In the following illustration, you will see that we can set the value of a field by giving it a value of Label1 if Condition1 is true, Label2 if Condition2 is true, and so on:

SPL>| eval Value=case(Condition1, "Label1", Condition2, "Label2", 
          ConditionX, "LabelX")

Let us tweak the previous query to use case instead of if:

SPL> index=main http_uri="/destination/*/details" 
     | eval Region=case(match(http_uri, "NY|MIA|MCO"), 
           "East",  match(http_uri, "WAS|AK|LAX|PML"), "West", 
           match(http_uri, "HOU"), "Central") 
     | top 0 Region, http_uri | sort Region

The result will now properly classify the destinations based on the region:

Using eval and match with a case function

Using eval and if

If you need to create a field based on the data present in an event, you can use the eval command to create a field variable and use if to check for that condition.

The if function takes the form of:

SPL> | eval value=if(condition, field1, field2)

Say you want to create two additional fields during search time to determine whether a destination is in the East Coast or not. Using the code presented next, if a destination URI has NY, MIA, or MCO in it, a new field called East will be added to each of those events. Otherwise, Splunk will add a new field called Others. Once that has been done, this code will list the newly-created Region field and http_uri for all events, and will sort by Region:

SPL> index=main http_uri="/destination/*/details" 
 | eval Region=if(match(http_uri, "NY|MIA|MCO"), "East", "Others") 
 | top 0 Region, http_uri | sort Region

A little regular expression has been used here to do a case statement between the airport codes: NY|MIA|MCO. If the http_uri includes NY, MIA, or MCO, then its Region field value will be East; otherwise it will be Others.

This should now return the data with the new fields:

Using eval and if

Using eval and match with a case function

You can optimize this query by using match instead of if and account for West and Central.

We also introduce the case function here. In the following illustration, you will see that we can set the value of a field by giving it a value of Label1 if Condition1 is true, Label2 if Condition2 is true, and so on:

SPL>| eval Value=case(Condition1, "Label1", Condition2, "Label2", 
          ConditionX, "LabelX")

Let us tweak the previous query to use case instead of if:

SPL> index=main http_uri="/destination/*/details" 
     | eval Region=case(match(http_uri, "NY|MIA|MCO"), 
           "East",  match(http_uri, "WAS|AK|LAX|PML"), "West", 
           match(http_uri, "HOU"), "Central") 
     | top 0 Region, http_uri | sort Region

The result will now properly classify the destinations based on the region:

Using eval and match with a case function

Using eval and match with a case function

You can optimize this query by using match instead of if and account for West and Central.

We also introduce the case function here. In the following illustration, you will see that we can set the value of a field by giving it a value of Label1 if Condition1 is true, Label2 if Condition2 is true, and so on:

SPL>| eval Value=case(Condition1, "Label1", Condition2, "Label2", 
          ConditionX, "LabelX")

Let us tweak the previous query to use case instead of if:

SPL> index=main http_uri="/destination/*/details" 
     | eval Region=case(match(http_uri, "NY|MIA|MCO"), 
           "East",  match(http_uri, "WAS|AK|LAX|PML"), "West", 
           match(http_uri, "HOU"), "Central") 
     | top 0 Region, http_uri | sort Region

The result will now properly classify the destinations based on the region:

Using eval and match with a case function

How to improve logs

Throughout this book, we have seen examples of how logs can be used to make applications more effective. We have also talked about how logs can be used to troubleshoot problems. In this last section, we will discuss some basics, recommended by Splunk that should be considered when creating logs.

Including clear key-value pairs

It is important to remember that data should be structured using clear key-value pairs. Doing so will help Splunk carry out automatic field-extraction in the way it is intended to and will do so in a faster and more efficient manner. Remember that we are talking about one of the most useful features of Splunk!

A model for doing this is shown here:

key1=value1, key2=value2, . . . etc. 

As you do this, remember that if it is important to include spaces in the values, in text fields, for example, you should surround the value with quotes:

key1="value1" or user="Matt Nguyen" 

Although you may find this method is lengthier and more verbose, it conveys a real advantage when it comes to field extraction, as it will allow it to occur automatically.

Creating events that are understandable to human readers

If possible, you should avoid readers having to create lookups to understand the meaning in your data. One way to do this is to use tools that can easily convert binary data to ASCII or text data, and use the same format throughout the file. If, for some reason, you have to use different formats, simply create separate files.

Remember to use timestamps for all events

There are many reasons why you should use timestamps for all events. Above all, their use helps determine the sequence in which events occurred, so is invaluable for problem-solving, data analytics, and other uses.

Also remember the following:

  • Include the timestamp at the beginning of each line, making it easier to find.
  • Use four digits for the year, for readability and identification purposes.
  • Be sure to include a time zone. Here it is best to include the standard GMT/UTC offset format.
  • Measure time to the microsecond. This could be helpful for identification of sequences or problem-solving at some point.

Be sure your identifiers are unique

This is an obvious rule and is familiar to anyone who has used or studied transactional data, but nonetheless we include it here, just because it is so important.

Log using text format, not binary

It is very hard for Splunk to search binary data easily or meaningfully. Therefore, wherever possible, create logs that are in text format. If, for some reason, your data has to be in binary format, be sure and include metadata that is in text format, so that it can be easily searched.

Use formats that developers can use easily

It is important to consider the usefulness of your log format. When setting up logs, include formats that are easy for developers to understand. One especially useful format is JavaScript Object Notation (JSON). Extensible Markup Language or XML can also be used, but JSON is somewhat cleaner to read.

Using the spath command, structured data can now be parsed in Splunk by many programming languages through the browser alone, without even using a server! Using the structured key-value pairs of JSON, you can easily use data with a built-in hierarchy, such as this email data, where recipient has a sub-level:

{ "sender" : "george" "recipient": { "firstname" : "michael", "firstname" : "shannon", "firstname" : "chloe" } subject:"Building my logs" } 

Log what you think might be useful at some point

As you look at your data, consider what you think might come in handy for answering business questions at some point. What would be useful for decision-making? What would be useful for problem-solving? What data might you want to use someday for a chart or graph? Be sure that you log in any information you think you might need in the future.

Create use categories with meaning

Be sure your categories convey meaning. Especially important are labels such as INFO, WARN, ERROR, and DEBUG, which clearly flag events that you want to pay attention to.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Including clear key-value pairs

It is important to remember that data should be structured using clear key-value pairs. Doing so will help Splunk carry out automatic field-extraction in the way it is intended to and will do so in a faster and more efficient manner. Remember that we are talking about one of the most useful features of Splunk!

A model for doing this is shown here:

key1=value1, key2=value2, . . . etc. 

As you do this, remember that if it is important to include spaces in the values, in text fields, for example, you should surround the value with quotes:

key1="value1" or user="Matt Nguyen" 

Although you may find this method is lengthier and more verbose, it conveys a real advantage when it comes to field extraction, as it will allow it to occur automatically.

Creating events that are understandable to human readers

If possible, you should avoid readers having to create lookups to understand the meaning in your data. One way to do this is to use tools that can easily convert binary data to ASCII or text data, and use the same format throughout the file. If, for some reason, you have to use different formats, simply create separate files.

Remember to use timestamps for all events

There are many reasons why you should use timestamps for all events. Above all, their use helps determine the sequence in which events occurred, so is invaluable for problem-solving, data analytics, and other uses.

Also remember the following:

  • Include the timestamp at the beginning of each line, making it easier to find.
  • Use four digits for the year, for readability and identification purposes.
  • Be sure to include a time zone. Here it is best to include the standard GMT/UTC offset format.
  • Measure time to the microsecond. This could be helpful for identification of sequences or problem-solving at some point.

Be sure your identifiers are unique

This is an obvious rule and is familiar to anyone who has used or studied transactional data, but nonetheless we include it here, just because it is so important.

Log using text format, not binary

It is very hard for Splunk to search binary data easily or meaningfully. Therefore, wherever possible, create logs that are in text format. If, for some reason, your data has to be in binary format, be sure and include metadata that is in text format, so that it can be easily searched.

Use formats that developers can use easily

It is important to consider the usefulness of your log format. When setting up logs, include formats that are easy for developers to understand. One especially useful format is JavaScript Object Notation (JSON). Extensible Markup Language or XML can also be used, but JSON is somewhat cleaner to read.

Using the spath command, structured data can now be parsed in Splunk by many programming languages through the browser alone, without even using a server! Using the structured key-value pairs of JSON, you can easily use data with a built-in hierarchy, such as this email data, where recipient has a sub-level:

{ "sender" : "george" "recipient": { "firstname" : "michael", "firstname" : "shannon", "firstname" : "chloe" } subject:"Building my logs" } 

Log what you think might be useful at some point

As you look at your data, consider what you think might come in handy for answering business questions at some point. What would be useful for decision-making? What would be useful for problem-solving? What data might you want to use someday for a chart or graph? Be sure that you log in any information you think you might need in the future.

Create use categories with meaning

Be sure your categories convey meaning. Especially important are labels such as INFO, WARN, ERROR, and DEBUG, which clearly flag events that you want to pay attention to.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Creating events that are understandable to human readers

If possible, you should avoid readers having to create lookups to understand the meaning in your data. One way to do this is to use tools that can easily convert binary data to ASCII or text data, and use the same format throughout the file. If, for some reason, you have to use different formats, simply create separate files.

Remember to use timestamps for all events

There are many reasons why you should use timestamps for all events. Above all, their use helps determine the sequence in which events occurred, so is invaluable for problem-solving, data analytics, and other uses.

Also remember the following:

  • Include the timestamp at the beginning of each line, making it easier to find.
  • Use four digits for the year, for readability and identification purposes.
  • Be sure to include a time zone. Here it is best to include the standard GMT/UTC offset format.
  • Measure time to the microsecond. This could be helpful for identification of sequences or problem-solving at some point.

Be sure your identifiers are unique

This is an obvious rule and is familiar to anyone who has used or studied transactional data, but nonetheless we include it here, just because it is so important.

Log using text format, not binary

It is very hard for Splunk to search binary data easily or meaningfully. Therefore, wherever possible, create logs that are in text format. If, for some reason, your data has to be in binary format, be sure and include metadata that is in text format, so that it can be easily searched.

Use formats that developers can use easily

It is important to consider the usefulness of your log format. When setting up logs, include formats that are easy for developers to understand. One especially useful format is JavaScript Object Notation (JSON). Extensible Markup Language or XML can also be used, but JSON is somewhat cleaner to read.

Using the spath command, structured data can now be parsed in Splunk by many programming languages through the browser alone, without even using a server! Using the structured key-value pairs of JSON, you can easily use data with a built-in hierarchy, such as this email data, where recipient has a sub-level:

{ "sender" : "george" "recipient": { "firstname" : "michael", "firstname" : "shannon", "firstname" : "chloe" } subject:"Building my logs" } 

Log what you think might be useful at some point

As you look at your data, consider what you think might come in handy for answering business questions at some point. What would be useful for decision-making? What would be useful for problem-solving? What data might you want to use someday for a chart or graph? Be sure that you log in any information you think you might need in the future.

Create use categories with meaning

Be sure your categories convey meaning. Especially important are labels such as INFO, WARN, ERROR, and DEBUG, which clearly flag events that you want to pay attention to.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Remember to use timestamps for all events

There are many reasons why you should use timestamps for all events. Above all, their use helps determine the sequence in which events occurred, so is invaluable for problem-solving, data analytics, and other uses.

Also remember the following:

  • Include the timestamp at the beginning of each line, making it easier to find.
  • Use four digits for the year, for readability and identification purposes.
  • Be sure to include a time zone. Here it is best to include the standard GMT/UTC offset format.
  • Measure time to the microsecond. This could be helpful for identification of sequences or problem-solving at some point.

Be sure your identifiers are unique

This is an obvious rule and is familiar to anyone who has used or studied transactional data, but nonetheless we include it here, just because it is so important.

Log using text format, not binary

It is very hard for Splunk to search binary data easily or meaningfully. Therefore, wherever possible, create logs that are in text format. If, for some reason, your data has to be in binary format, be sure and include metadata that is in text format, so that it can be easily searched.

Use formats that developers can use easily

It is important to consider the usefulness of your log format. When setting up logs, include formats that are easy for developers to understand. One especially useful format is JavaScript Object Notation (JSON). Extensible Markup Language or XML can also be used, but JSON is somewhat cleaner to read.

Using the spath command, structured data can now be parsed in Splunk by many programming languages through the browser alone, without even using a server! Using the structured key-value pairs of JSON, you can easily use data with a built-in hierarchy, such as this email data, where recipient has a sub-level:

{ "sender" : "george" "recipient": { "firstname" : "michael", "firstname" : "shannon", "firstname" : "chloe" } subject:"Building my logs" } 

Log what you think might be useful at some point

As you look at your data, consider what you think might come in handy for answering business questions at some point. What would be useful for decision-making? What would be useful for problem-solving? What data might you want to use someday for a chart or graph? Be sure that you log in any information you think you might need in the future.

Create use categories with meaning

Be sure your categories convey meaning. Especially important are labels such as INFO, WARN, ERROR, and DEBUG, which clearly flag events that you want to pay attention to.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Be sure your identifiers are unique

This is an obvious rule and is familiar to anyone who has used or studied transactional data, but nonetheless we include it here, just because it is so important.

Log using text format, not binary

It is very hard for Splunk to search binary data easily or meaningfully. Therefore, wherever possible, create logs that are in text format. If, for some reason, your data has to be in binary format, be sure and include metadata that is in text format, so that it can be easily searched.

Use formats that developers can use easily

It is important to consider the usefulness of your log format. When setting up logs, include formats that are easy for developers to understand. One especially useful format is JavaScript Object Notation (JSON). Extensible Markup Language or XML can also be used, but JSON is somewhat cleaner to read.

Using the spath command, structured data can now be parsed in Splunk by many programming languages through the browser alone, without even using a server! Using the structured key-value pairs of JSON, you can easily use data with a built-in hierarchy, such as this email data, where recipient has a sub-level:

{ "sender" : "george" "recipient": { "firstname" : "michael", "firstname" : "shannon", "firstname" : "chloe" } subject:"Building my logs" } 

Log what you think might be useful at some point

As you look at your data, consider what you think might come in handy for answering business questions at some point. What would be useful for decision-making? What would be useful for problem-solving? What data might you want to use someday for a chart or graph? Be sure that you log in any information you think you might need in the future.

Create use categories with meaning

Be sure your categories convey meaning. Especially important are labels such as INFO, WARN, ERROR, and DEBUG, which clearly flag events that you want to pay attention to.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Log using text format, not binary

It is very hard for Splunk to search binary data easily or meaningfully. Therefore, wherever possible, create logs that are in text format. If, for some reason, your data has to be in binary format, be sure and include metadata that is in text format, so that it can be easily searched.

Use formats that developers can use easily

It is important to consider the usefulness of your log format. When setting up logs, include formats that are easy for developers to understand. One especially useful format is JavaScript Object Notation (JSON). Extensible Markup Language or XML can also be used, but JSON is somewhat cleaner to read.

Using the spath command, structured data can now be parsed in Splunk by many programming languages through the browser alone, without even using a server! Using the structured key-value pairs of JSON, you can easily use data with a built-in hierarchy, such as this email data, where recipient has a sub-level:

{ "sender" : "george" "recipient": { "firstname" : "michael", "firstname" : "shannon", "firstname" : "chloe" } subject:"Building my logs" } 

Log what you think might be useful at some point

As you look at your data, consider what you think might come in handy for answering business questions at some point. What would be useful for decision-making? What would be useful for problem-solving? What data might you want to use someday for a chart or graph? Be sure that you log in any information you think you might need in the future.

Create use categories with meaning

Be sure your categories convey meaning. Especially important are labels such as INFO, WARN, ERROR, and DEBUG, which clearly flag events that you want to pay attention to.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Use formats that developers can use easily

It is important to consider the usefulness of your log format. When setting up logs, include formats that are easy for developers to understand. One especially useful format is JavaScript Object Notation (JSON). Extensible Markup Language or XML can also be used, but JSON is somewhat cleaner to read.

Using the spath command, structured data can now be parsed in Splunk by many programming languages through the browser alone, without even using a server! Using the structured key-value pairs of JSON, you can easily use data with a built-in hierarchy, such as this email data, where recipient has a sub-level:

{ "sender" : "george" "recipient": { "firstname" : "michael", "firstname" : "shannon", "firstname" : "chloe" } subject:"Building my logs" } 

Log what you think might be useful at some point

As you look at your data, consider what you think might come in handy for answering business questions at some point. What would be useful for decision-making? What would be useful for problem-solving? What data might you want to use someday for a chart or graph? Be sure that you log in any information you think you might need in the future.

Create use categories with meaning

Be sure your categories convey meaning. Especially important are labels such as INFO, WARN, ERROR, and DEBUG, which clearly flag events that you want to pay attention to.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Log what you think might be useful at some point

As you look at your data, consider what you think might come in handy for answering business questions at some point. What would be useful for decision-making? What would be useful for problem-solving? What data might you want to use someday for a chart or graph? Be sure that you log in any information you think you might need in the future.

Create use categories with meaning

Be sure your categories convey meaning. Especially important are labels such as INFO, WARN, ERROR, and DEBUG, which clearly flag events that you want to pay attention to.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Create use categories with meaning

Be sure your categories convey meaning. Especially important are labels such as INFO, WARN, ERROR, and DEBUG, which clearly flag events that you want to pay attention to.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Include the source of the log event

Be sure to include information that conveys where the event came from, be it a file, a function, or a class.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Minimize the number of multi-line events

In general, it is good to minimize the number of events that include many lines. Sometimes, you will have to leave them as multi-line, but this can slow down indexing and search speeds. Other times, you may want to turn multi-line events into shorter events.

Summary

In this chapter, you have learned some best practices to employ when using Splunk. You were also shown complex queries that can further enhance your result set.

This brings our book to a close. We hope that you have enjoyed this adventure with Splunk. If have completed (or even mostly completed) the steps in this book, we can assure you that you should now have a strong knowledge of this important software. Splunk appears, as we write, to be growing more and more successful in the marketplace. It is positioned to become even more important as the Internet of Things (IoT) continues its growing influence on the daily lives of individuals and businesses. Splunk is a skill that will help you as you navigate the exciting world of data and all the advantages it will bring to our future.