Advanced Splunk

Advanced Splunk

By : Ashish Kumar Tulsiram Yadav

Buy this Book

Advanced Splunk

By: Ashish Kumar Tulsiram Yadav

Buy this Book

Overview of this book

Master the power of Splunk and learn the advanced strategies to get the most out of your machine data with this practical advanced guide. Make sense of the hidden data of your organization – the insight of your servers, devices, logs, traffic and clouds. Advanced Splunk shows you how. Dive deep into Splunk to find the most efficient solution to your data problems. Create the robust Splunk solutions you need to make informed decisions in big data machine analytics. From visualizations to enterprise integration, this well-organized high level guide has everything you need for Splunk mastery. Start with a complete overview of all the new features and advantages of the latest version of Splunk and the Splunk Environment. Go hands on with uploading data, search commands for basic and advanced analytics, advanced visualization techniques, and dashboard customizing. Discover how to tweak Splunk to your needs, and get a complete on Enterprise Integration of Splunk with various analytics and visualization tools. Finally, discover how to set up and use all the new features of the latest version of Splunk.

Advanced Splunk

Credits

About the Author

Acknowledgements

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

What's New in Splunk 6.3?

Splunk's architecture

Search parallelization

Data integrity control

Intelligent job scheduling

The app key-value store

Splunk Enterprise Security

Authentication using SAML

Summary

Developing an Application on Splunk

Splunk apps and technology add-ons

Developing a Splunk app

Developing a Splunk add-on

Managing Splunk apps and add-ons

Splunk apps from the app store

Summary

On-boarding Data in Splunk

Deep diving into various input methods and sources

Adding data to Splunk – new interfaces

Data processing

Managing event segmentation

Improving the data input process

Summary

Data Analytics

Data and indexes

Subsearch

Time

Fields

Results

Summary

Advanced Data Analytics

Reports

Geography and location

Anomalies

Predicting and trending

Correlation

Machine learning

Summary

Visualization

Prerequisites – configuration settings

Tables

Single value

Charts

Drilldown

Summary

Advanced Visualization

Sunburst sequence

Geospatial visualization

Punchcard visualization

Calendar heatmap visualization

The Sankey diagram

Parallel coordinates

The force directed graph

Custom chart overlay

Custom decorations

Summary

Dashboard Customization

Dashboard controls

Multi-search management

Tokens

Null search swapper

Switcher

Summary

Advanced Dashboard Customization

Layout customization

Custom look and feel

The custom alert action

Summary

Tweaking Splunk

Index replication

Indexer auto-discovery

Sourcetype manager

Field extractor

Search history

Event pattern detection

Summary

Enterprise Integration with Splunk

The Splunk SDK

Installing the Splunk SDK

The Splunk SDK for Python

Splunk with R for analytics

Splunk with Tableau for visualization

Summary

What Next? Splunk 6.4

Storage optimization

Machine learning

Management and admin

Indexer and search head enhancement

Visualizations

Multi-search management

Enhanced alert actions

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Intelligent job scheduling

This section will explain in detail how Splunk Enterprise handles scheduled reports in order to run them concurrently. Splunk uses a report scheduler to manage scheduled alerts and reports. Depending on the configuration of the system, the scheduler sets a limit on the number of reports that can be run concurrently on the Splunk search head. Whenever the number of scheduled reports crosses the threshold limit set by the scheduler, it has to prioritize the excess reports and run them in order of their priority.

The limit is set by a scheduler so as to make sure that the system performance is not degraded and fewer or no reports get skipped disproportionally more than others. Generally, reports are skipped when slow-to-complete reports crowd out quick-to-complete reports, thus causing them to miss their scheduled runtime.

The following table shows the priority order in which Splunk runs different types of searches:

Priority	Search/report type	Description
First priority	Ad hoc historical searches	Manually run historically searches always run first Ad hoc search jobs are given more priority than scheduled ad hoc search reports
Second priority	Manually scheduled reports and alerts with real-time scheduling	Reports scheduled manually use a real-time scheduling mode by default Manually run searches are prioritized against reports to reduce skipping of manually scheduled reports and alerts
Third priority	Manually scheduled reports with continuous scheduling	The continuous scheduling mode is used by scheduled reports, populating summary indexes and other reports
Last priority	Automatically scheduled reports	Scheduled reports related to report acceleration and data model acceleration fall into this category These reports are always given last priority

Tip

Caution:

It is suggested that you do not change the settings until and unless you are aware of what you are doing.

The limit is automatically determined by Splunk on the basis of system-wide concurrent historical searches, depending upon the values of max_searches_per_cpu, base_max_searches in the limits.conf file located at $SPLUNK_HOME\etc\system\local.

The default value of base_max_searches is 6.

It is calculated as follows:

Maximum number of concurrent historical searches = (max_searches_per_cpu * number of CPU) + base_max_searches

So, for a system with two CPUs, the value should be 8. To get a better clarity see the following worked out example:

Maximum number of concurrent historical searches = (1 * 2) + 6 = 8

The max_searches_perc parameter can be set up so that it allows more or less concurrent scheduled reports depending on the requirement. For a system with two CPUs, the report scheduler can safely run only four scheduled reports at a time (50 percent of the maximum number of concurrent historical searches), that is, 50 percent of 8 = 4.

For efficient and full use of the Splunk scheduler, the scheduler limit can vary by time. The scheduler limit can be set to whether to have fewer or more concurrent scheduled reports.

Now, let's configure intelligent job scheduling. Modify the limits.conf file located at the $SPLUNK_HOME\etc\system\local directory. The max_searches_perc.n is to be set up with appropriate percentages for specific cron periods:

# The default limit, used when the periods defined below are not in effect.
max_searches_perc = 50 

#  Change the max search percentage at 5am every day when specifically there is less load on server.
max_searches_perc.0 = 70
max_searches_perc.0.when = * 0-5 * * *

#  Change the max search percentage even more on Saturdays and Sundays
max_searches_perc.1 = 90
max_searches_perc.1.when = * 0-5 * * 0,6

There are two scheduling modes of manually scheduled reports, which are as follows:

Real-time scheduling: In this type of scheduling, Splunk ensures that the recent run of the report returns current data. This means that a scheduled report with real-time scheduling runs at its scheduled runtime or not at all.
If there are longer running reports that have not finished or there are many reports with real-time scheduling set to run at the same time, then in that case, some of the real-time scheduling reports may be skipped.
A report scheduler prioritizes reports with real-time scheduling over reports with continuous scheduling.
Continuous scheduling: Continuous scheduling is used in a situation where running the report is eventually required. In case a report with continuous scheduling is not able to run due to one or other reason, then it will run in future after other reports are finished.
All the scheduled reports are, by default, set to real-time scheduling unless they are enabled for summary indexing. In case of summary indexing, the scheduling mode is set to continuous scheduling because summary indexes are not that reliable if scheduled reports that populate them are skipped.
If there is any server failure or Splunk Enterprise is shut down for some reason, then in that case, the continuous scheduling mode's configured reports will miss scheduled runtime. The report scheduler can replace all the missed runs of continuously scheduled reports of the last 24 hours when Splunk Enterprise goes online, provided that it was at least once on its schedule before the Splunk Enterprise instance went down.

Let's configure the scheduling mode next. To configure scheduled reports so that they are in a real-time scheduling mode or in a continuous scheduling mode, the realtime_schedule parameter in the savedsearches.conf file is to be manually changed from realtime_schedule to 0 or 1. Both the scheduling modes are explained as follows:

realtime_schedule = 0: This mode enables scheduled reports that are to be in a continuous scheduling mode. This ensures that the scheduled reports never skip any run. If it cannot run at that moment, it will run later when other reports are over.
realtime_schedule = 1: This mode enables a scheduled report to run at its scheduled start time. If it cannot start due to other reports, it skips that scheduled run. This is the default scheduling mode for new reports.

Advanced Splunk

By : Ashish Kumar Tulsiram Yadav

Advanced Splunk

By: Ashish Kumar Tulsiram Yadav

Overview of this book

Related Content you might be interested in

Current Title:

Advanced Splunk

Intelligent job scheduling

Tip