Book Image

Advanced Splunk

By : Ashish Kumar Tulsiram Yadav
Book Image

Advanced Splunk

By: Ashish Kumar Tulsiram Yadav

Overview of this book

Master the power of Splunk and learn the advanced strategies to get the most out of your machine data with this practical advanced guide. Make sense of the hidden data of your organization – the insight of your servers, devices, logs, traffic and clouds. Advanced Splunk shows you how. Dive deep into Splunk to find the most efficient solution to your data problems. Create the robust Splunk solutions you need to make informed decisions in big data machine analytics. From visualizations to enterprise integration, this well-organized high level guide has everything you need for Splunk mastery. Start with a complete overview of all the new features and advantages of the latest version of Splunk and the Splunk Environment. Go hands on with uploading data, search commands for basic and advanced analytics, advanced visualization techniques, and dashboard customizing. Discover how to tweak Splunk to your needs, and get a complete on Enterprise Integration of Splunk with various analytics and visualization tools. Finally, discover how to set up and use all the new features of the latest version of Splunk.
Table of Contents (20 chapters)
Advanced Splunk
Credits
About the Author
Acknowledgements
About the Reviewer
www.PacktPub.com
Preface
Index

Intelligent job scheduling


This section will explain in detail how Splunk Enterprise handles scheduled reports in order to run them concurrently. Splunk uses a report scheduler to manage scheduled alerts and reports. Depending on the configuration of the system, the scheduler sets a limit on the number of reports that can be run concurrently on the Splunk search head. Whenever the number of scheduled reports crosses the threshold limit set by the scheduler, it has to prioritize the excess reports and run them in order of their priority.

The limit is set by a scheduler so as to make sure that the system performance is not degraded and fewer or no reports get skipped disproportionally more than others. Generally, reports are skipped when slow-to-complete reports crowd out quick-to-complete reports, thus causing them to miss their scheduled runtime.

The following table shows the priority order in which Splunk runs different types of searches:

Priority

Search/report type

Description

First priority

Ad hoc historical searches

  • Manually run historically searches always run first

  • Ad hoc search jobs are given more priority than scheduled ad hoc search reports

Second priority

Manually scheduled reports and alerts with real-time scheduling

  • Reports scheduled manually use a real-time scheduling mode by default

  • Manually run searches are prioritized against reports to reduce skipping of manually scheduled reports and alerts

Third priority

Manually scheduled reports with continuous scheduling

  • The continuous scheduling mode is used by scheduled reports, populating summary indexes and other reports

Last priority

Automatically scheduled reports

  • Scheduled reports related to report acceleration and data model acceleration fall into this category

  • These reports are always given last priority

Tip

Caution:

It is suggested that you do not change the settings until and unless you are aware of what you are doing.

The limit is automatically determined by Splunk on the basis of system-wide concurrent historical searches, depending upon the values of max_searches_per_cpu, base_max_searches in the limits.conf file located at $SPLUNK_HOME\etc\system\local.

The default value of base_max_searches is 6.

It is calculated as follows:

Maximum number of concurrent historical searches = (max_searches_per_cpu * number of CPU) + base_max_searches

So, for a system with two CPUs, the value should be 8. To get a better clarity see the following worked out example:

Maximum number of concurrent historical searches = (1 * 2) + 6 = 8

The max_searches_perc parameter can be set up so that it allows more or less concurrent scheduled reports depending on the requirement. For a system with two CPUs, the report scheduler can safely run only four scheduled reports at a time (50 percent of the maximum number of concurrent historical searches), that is, 50 percent of 8 = 4.

For efficient and full use of the Splunk scheduler, the scheduler limit can vary by time. The scheduler limit can be set to whether to have fewer or more concurrent scheduled reports.

Now, let's configure intelligent job scheduling. Modify the limits.conf file located at the $SPLUNK_HOME\etc\system\local directory. The max_searches_perc.n is to be set up with appropriate percentages for specific cron periods:

# The default limit, used when the periods defined below are not in effect.
max_searches_perc = 50 

#  Change the max search percentage at 5am every day when specifically there is less load on server.
max_searches_perc.0 = 70
max_searches_perc.0.when = * 0-5 * * *

#  Change the max search percentage even more on Saturdays and Sundays
max_searches_perc.1 = 90
max_searches_perc.1.when = * 0-5 * * 0,6

There are two scheduling modes of manually scheduled reports, which are as follows:

  • Real-time scheduling: In this type of scheduling, Splunk ensures that the recent run of the report returns current data. This means that a scheduled report with real-time scheduling runs at its scheduled runtime or not at all.

    If there are longer running reports that have not finished or there are many reports with real-time scheduling set to run at the same time, then in that case, some of the real-time scheduling reports may be skipped.

    A report scheduler prioritizes reports with real-time scheduling over reports with continuous scheduling.

  • Continuous scheduling: Continuous scheduling is used in a situation where running the report is eventually required. In case a report with continuous scheduling is not able to run due to one or other reason, then it will run in future after other reports are finished.

    All the scheduled reports are, by default, set to real-time scheduling unless they are enabled for summary indexing. In case of summary indexing, the scheduling mode is set to continuous scheduling because summary indexes are not that reliable if scheduled reports that populate them are skipped.

    If there is any server failure or Splunk Enterprise is shut down for some reason, then in that case, the continuous scheduling mode's configured reports will miss scheduled runtime. The report scheduler can replace all the missed runs of continuously scheduled reports of the last 24 hours when Splunk Enterprise goes online, provided that it was at least once on its schedule before the Splunk Enterprise instance went down.

Let's configure the scheduling mode next. To configure scheduled reports so that they are in a real-time scheduling mode or in a continuous scheduling mode, the realtime_schedule parameter in the savedsearches.conf file is to be manually changed from realtime_schedule to 0 or 1. Both the scheduling modes are explained as follows:

  • realtime_schedule = 0: This mode enables scheduled reports that are to be in a continuous scheduling mode. This ensures that the scheduled reports never skip any run. If it cannot run at that moment, it will run later when other reports are over.

  • realtime_schedule = 1: This mode enables a scheduled report to run at its scheduled start time. If it cannot start due to other reports, it skips that scheduled run. This is the default scheduling mode for new reports.