Book Image

Splunk 9.x Enterprise Certified Admin Guide

By : Srikanth Yarlagadda
Book Image

Splunk 9.x Enterprise Certified Admin Guide

By: Srikanth Yarlagadda

Overview of this book

The IT sector's appetite for Splunk and skilled Splunk developers continues to surge, offering more opportunities for developers with each passing decade. If you want to enhance your career as a Splunk Enterprise administrator, then Splunk 9.x Enterprise Certified Admin Guide will not only aid you in excelling on your exam but also pave the way for a successful career. You’ll begin with an overview of Splunk Enterprise, including installation, license management, user management, and forwarder management. Additionally, you’ll delve into indexes management, including the creation and management of indexes used to store data in Splunk. You’ll also uncover config files, which are used to configure various settings and components in Splunk. As you advance, you’ll explore data administration, including data inputs, which are used to collect data from various sources, such as log files, network protocols (TCP/UDP), APIs, and agentless inputs (HEC). You’ll also discover search-time and index-time field extraction, used to create reports and visualizations, and help make the data in Splunk more searchable and accessible. The self-assessment questions and answers at the end of each chapter will help you gauge your understanding. By the end of this book, you’ll be well versed in all the topics required to pass the Splunk Enterprise Admin exam and use Splunk features effectively.
Table of Contents (17 chapters)
Part 1: Splunk System Administration
Part 2:Splunk Data Administration
Chapter 12: Self-Assessment Mock Exam

Splunk Validated Architectures (SVAs)

This section is completely optional as this topic isn’t included in the Splunk admin exam blueprint; however, I recommend going through it to get an insight and familiarize yourself with what Splunk’s architecture looks like, as well as where the processing and management components are positioned and interconnected.

So far, we have learned about Splunk Enterprise’s features and components and their roles in a standalone or distributed deployment. It is time to see some of the deployment architectures, called SVAs, curated by the best minds at Splunk Inc.

Just as there is more than one solution to a problem, similarly, a single architecture might not fit every organization. For Splunk Enterprise architects and Splunk Enterprise admins who go through many variables and evaluate to come up with a suitable design, SVAs offer guidance with best practices and off-the-shelf readily available designs. A Splunk Enterprise architect’s roles and responsibilities vary from that of a typical admin. Splunk Education offers courses to prepare you to become a Splunk Enterprise-certified architect, and the Splunk Enterprise Admin certification is a prerequisite.

Let’s go through some of the prominent validated architectures of Splunk Enterprise on-premises. A full list of SVAs is available here:

Single-server deployment

A single-server alias standalone deployment consists of a Splunk Enterprise instance that combines both SH and indexer functionality.

The following diagram shows the deployment architecture:

Figure 1.4: Standalone deployment architecture

Figure 1.4: Standalone deployment architecture

The diagram shows a standalone/single Splunk instance, a collection tier forwarding events to a single instance, and an optional DS to manage the collection tier/forwarders.

The only advantage of this deployment type is its cost-effective and easy to manage.

Let’s look at the limitations of this deployment type, as follows:

  • Works for a limited number of users, between four and eight
  • Data indexing size is limited to below 300 GB per day
  • Does not work effectively for critical searches
  • No high availability and disaster recovery
  • Migrating to distributed deployment is straightforward with additional hardware

Let’s take a look at distributed non-cluster deployment, which is a more advanced setup than a single-server deployment.

Distributed non-clustered deployment

Distributed non-clustered deployment works better for additional workload and indexing capacity than a single-server deployment. The separation of SH and indexing duties increases the total cost of ownership (TCO).

The following diagram shows the non-clustered deployment architecture with separate SH and indexing tiers:

Figure 1.5: Distributed non-clustered architecture

Figure 1.5: Distributed non-clustered architecture

In the depicted architecture, a separate search tier comprises a SHC and an indexing tier with multiple standalone indexers. The SHC-D is a mandatory management component responsible for deploying configurations to the SHC using apps. It facilitates the deployment process by pushing configuration updates via apps from the SHC-D to the SHC. A DS is utilized for managing forwarders, while an LM stores license information. The DS ensures effective forwarder management, while the LM serves as a central repository for license details, with all other instances connecting to it for license information. Let’s look at the advantages of this deployment over single-server deployment:

  • The number of users would be higher than in a single-server deployment with additional indexer support
  • Independent indexers increase the daily indexing capacity to over 300 GB

Now, let’s look at the limitations of this deployment:

  • No HA and DR
  • The SH needs to be reconfigured every time a new search peer/indexer is added
  • Search results might be incomplete when one of the indexers is down, and data ingestion might be impacted as well
  • In the case of a standalone single-SH deployment scenario, there is a single point of failure (SPOF)

Let’s take a look at distributed cluster deployment, which is a more advanced setup than a distributed non-cluster deployment.

Distributed cluster deployment and SHC – single-site

A distributed clustered SH and indexer deployment at a single site is a highly available, resilient architecture. A site is a classic data center in a particular region/geography.

The following diagram shows the clustered deployment architecture with a separate SHC and clustered indexing tier running on a single site:

Figure 1.6: Distributed clustered deployment and SHC – single-site

Figure 1.6: Distributed clustered deployment and SHC – single-site

Figure 1.6 shares similarities with Figure 1.5, as it depicts a similar architecture. However, in Figure 1.6, an additional management component, known as the CM, is introduced. The CM is responsible for overseeing and managing the indexer cluster, providing coordination and control of the cluster’s operation. It acts as a central point for configuring and monitoring the indexers within the cluster, ensuring their effective functioning and synchronization. Let’s understand the advantages of this over the two architectures we previously looked at:

  • Using an SHC avoids an outage in the case of node failure by replicating the configs and artifacts. Job scheduling is managed by the captain, which is one of the elected nodes from the cluster itself.
  • The indexer cluster enables data HA by maintaining redundant copies across the cluster. The CM is a separate management component that ensures data availability for searches.
  • The SHC scales by adding more nodes compared to a single instance, allowing for increased capacity in handling concurrent users and executing searches. Typically, each CPU is considered as one unit for search capacity calculation, meaning one CPU counts as one search. This allows for better scalability and improved performance in handling larger workloads and user demands.
  • Additional management components are the SHC-D and CM to aid in the deployment of apps to cluster members.
  • The Indexer Discovery feature aids the SHC to discover when a new search peer/indexer is added. The SHC doesn’t require reconfiguration for every new indexer node.

Now, let’s look at its limitations compared to the previous architectures:

  • No DR with a single site. A failure of a site will eventuate the failure of the entire deployment.
  • The SHC has a limitation of 100 nodes.
  • It increases the TCO and the management of the SHC and index cluster.

Let’s take a look at the multi-site distributed clustered deployment, which is a more advanced setup than distributed clustered deployment and single-site.

Distributed clustered deployment and SHC – multi-site

This is by far the most complex architecture valid for organizations that have strict HA and DR requirements. It has the same advantages as single-site architecture (as seen in the previous section), and the failure of a site doesn’t impact the entire deployment.

The following diagram shows a clustered deployment architecture with an SHC and clustered indexing tiers deployed in more than one site:

Figure 1.7: Distributed clustered deployment and SHC – multi-site

Figure 1.7: Distributed clustered deployment and SHC – multi-site

As in Figure 1.6, the components remain the same in each site. However, the collection tier is common across both sites. Each site has a dedicated SHC.

Let’s understand the limitations:

  • Indexer clusters replicate data between sites, which is called cross-site replication, requiring lower network latency. 100 milliseconds or less is preferred.
  • SHCs work independently and do not share artifacts and common configurations.
  • SHCs have a 100-node limitation per site.
  • A dedicated SHC-D is required for each site.
  • A single CM node suffices for an entire cluster of indexers across sites.

We’ve looked at a very basic single-server architecture (preferably used for testing or development) and an advanced multi-site cluster deployment architecture. Each has its advantages, limitations, and cost implications. At this stage, you are pretty much familiar with Splunk components and architectures. In the next section, we are going to install a standalone/single-server deployment, which we talked about at the very beginning of this section.