Book Image

Practical Threat Detection Engineering

By : Megan Roddie, Jason Deyalsingh, Gary J. Katz
5 (2)
Book Image

Practical Threat Detection Engineering

5 (2)
By: Megan Roddie, Jason Deyalsingh, Gary J. Katz

Overview of this book

Threat validation is an indispensable component of every security detection program, ensuring a healthy detection pipeline. This comprehensive detection engineering guide will serve as an introduction for those who are new to detection validation, providing valuable guidelines to swiftly bring you up to speed. The book will show you how to apply the supplied frameworks to assess, test, and validate your detection program. It covers the entire life cycle of a detection, from creation to validation, with the help of real-world examples. Featuring hands-on tutorials and projects, this guide will enable you to confidently validate the detections in your security program. This book serves as your guide to building a career in detection engineering, highlighting the essential skills and knowledge vital for detection engineers in today's landscape. By the end of this book, you’ll have developed the skills necessary to test your security detection program and strengthen your organization’s security measures.
Table of Contents (20 chapters)
1
Part 1: Introduction to Detection Engineering
5
Part 2: Detection Creation
11
Part 3: Detection Validation
14
Part 4: Metrics and Management
16
Part 5: Detection Engineering as a Career

Foundational concepts

The foundation of how we can track and categorize an adversary’s actions allows us to prioritize and understand the scope or coverage of our detections. The following subsection covers common frameworks and models that will be referenced throughout this book. They provide a starting model for framing cyberattacks, their granular sub-components, and how to defend against them.

The Unified Kill Chain

Cyberattacks tend to follow a predictable pattern that should be understood by defenders. This pattern was initially documented as the now famous Lockheed Martin Cyber Kill Chain. This model has been adapted and modernized over time by multiple vendors. The Unified Kill Chain is a notable modernization of the model. This model defines 18 broad tactics across three generalized goals, which provides defenders with a reasonable framework for designing appropriate defenses according to attackers’ objectives. Let’s look at these goals:

  • In: The attacker’s goal at this phase is to research the potential victim, discover possible attack vectors, and gain and maintain reliable access to a target environment.
  • Through: Having gained access to a target environment, the threat actor needs to orient themselves and gather supplemental resources required for the remainder of the attack, such as privileged credentials.
  • Out: These tactics are focused on completing the objective of the cyberattack. In the case of double extortion ransomware, this would include staging files for exfiltration, copying those files to attacker infrastructure, and, finally, the large-scale deployment of ransomware.

Figure 1.1, based on the Unified Kill Chain whitepaper by Paul Pols, shows the individual tactics in each phase of the kill chain:

Figure 1.1 – The Unified Kill Chain

Figure 1.1 – The Unified Kill Chain

To better understand how the Unified Kill Chain applies to cyberattacks, let’s look at how it maps to a well-known attack. We are specifically going to look at an Emotet attack campaign. Emotet is a malicious payload often distributed via email and used to deliver additional payloads that will carry out the attacker’s final objectives. The specific campaign we will analyze is one reported on by The DFIR Report in November 2022: https://thedfirreport.com/2022/11/28/emotet-strikes-again-lnk-file-leads-to-domain-wide-ransomware/.

Table 1.1 lists the stages of the attack, as reported in the article, and how they map to the Unified Kill Chain:

Attack Event

Unified Kill Chain Phase Group

Unified Kill Chain Phase

Emotet executed via LNK malspam attachment

In

Delivery

Emotet sends outbound SMTP spam email

Network propagation

Pivoting

Domain enumeration via Cobalt Strike

Through

Discovery

Lateral movement to user workstation

Through

Pivoting

SMB share enumeration

Through

Discovery

Zerologon exploit attempt

In

Exploitation

Remote Management Agent installed

In

Command and control/persistence

Exfiltration via Rclone to Mega

Out

Exfiltration

Ransomware execution

Out

Impact

Table 1.1 – Unified Kill Chain mapping for Emotet attack chain

As can be seen from Table 1.1, not all phases will take place in every attack and may not occur in a linear order.

To read the full Unified Kill Chain whitepaper, visit this link: https://www.unifiedkillchain.com/assets/The-Unified-Kill-Chain.pdf.

While this follows the progression of a typical cyberattack, as the paper outlines and as our example shots show, it is not uncommon for the attacker to execute some tactics outside this expected order. While the Unified Kill Chain provides a model for how threat actors carry out attacks, it does not dive into the detailed techniques that can be used to achieve the goals of each phase in the kill chain. The MITRE ATT&CK framework provides more granular insight into the tactics, techniques, and procedures leveraged by threat actors.

The MITRE ATT&CK framework

The MITRE ATT&CK framework is a knowledge base developed by the MITRE Corporation. The framework classifies threat actor objectives and catalogs the granular tools and activities related to achieving those objectives.

ATT&CK stands for Adversarial Tactics, Techniques, and Common Knowledge. The MITRE ATT&CK framework groups adversarial techniques into high-level categories called tactics. Each tactic represents a smaller immediate goal within the overall cyberattack. This framework will be referenced frequently throughout this book, providing an effective model for designing and validating detections. The following points detail the high-level tactics included as part of the Enterprise ATT&CK framework:

  • Reconnaissance: This tactic falls within the initial foothold phase of the Unified Kill Chain. Here, the threat actor gathers information about their target. At this stage, the attacker may use tools to passively collect technical details about the target, such as any publicly accessible infrastructure, emails, vulnerable associate businesses, and the like. In ideal cases, the threat actor may identify publicly accessible and vulnerable interfaces, but reconnaissance can also include gathering information about employees of an organization to identify possible targets for social engineering and understand how various internal business processes work.
  • Resource development: This tactic falls within the initial foothold phase of the Unified Kill Chain. Having identified a plausible attack vector, threat actors design an appropriate attack and develop technical resources to facilitate the attack. This phase includes creating, purchasing, or stealing credentials, infrastructure, or capabilities specifically to support the operation against the target.
  • Initial access: This tactic falls within the initial foothold phase of the Unified Kill Chain. The threat actor attempts to gain access to an asset in the victim-controlled environment. A variety of tools can be leveraged in combination at this point, ranging from cleverly designed phishing campaigns to deploying code that weaponizes yet-undisclosed vulnerabilities in exposed software interfaces (also known as zero-day attacks).
  • Execution: Tactics in this category fall within the initial foothold and network propagation phases of the Unified Kill Chain. The attacker aims to execute their code on a target asset. Code used in this phase typically attempts to collect additional details about the target network, understand the security context the code is executing under, or collect data and return it to infrastructure controlled by the threat actor.
  • Persistence: This tactic falls within the initial foothold category of the Unified Kill Chain. Initial access to a foreign environment can be volatile. Threat actors prefer robust and survivable access to target systems. Persistence techniques focus on maintaining access despite system restarts or modifications to identities and infrastructure.
  • Privilege escalation: This tactic falls within the network propagation category of the Unified Kill Chain. Having gained access to the victim control environment, the threat actor typically attempts to attain the highest level of privileges possible. Privileged access provides a means for executing nearly every option available to the administrators of the victim, removing many roadblocks that may prevent them from taking action on the attacker’s objectives. Having privileged access can also make threat actor activities more challenging to detect.
  • Defense evasion: This tactic falls within the initial foothold category of the Unified Kill Chain. Threat actors must understand the victim’s defense systems to design appropriate methods for avoiding them. Successful evasion of defense increases the likelihood of a successful operation. These tactics focus specifically on finding ways to subvert or otherwise avoid the target’s defensive controls.
  • Credential access: This tactic falls within the initial foothold and action on objectives categories of the Unified Kill Chain. Identities control access to systems. Harvesting credentials or credential material is essential for completely dominating a victim’s environment. Access to multiple systems and credentials makes navigating environments easier and lets attackers pivot if the event credentials are modified.
  • Discovery: This tactic falls within the network propagation category of the Unified Kill Chain. These techniques focus on understanding the victim’s internal environment. The internal network layout, infrastructure configuration, identity information, and defense systems must be understood to plan for the remaining phases of the attack.
  • Lateral movement: This tactic falls within the action on objectives category of the Unified Kill Chain. Systems that are accessed for the first time often do not have the information or resources (tools, credential material, direct connectivity, or visibility) required to complete objectives. Following the discovery of connected systems, and with the proper credentials, the adversary can, and often needs to, move from the current system to other connected systems. These techniques are all focused on traversing the victim’s environment.
  • Collection: This tactic falls within the action on objectives category of the Unified Kill Chain. These techniques focus on performing internal reconnaissance. Access to new environments provides new visibility, and understanding the technical environment is essential for planning the subsequent phases of the attack.
  • Command and control: This tactic falls within the initial access category of the Unified Kill Chain. It allows us to implement systems so that we can remotely control the victim’s environment.
  • Exfiltration: This tactic falls within the action on objectives category of the Unified Kill Chain. Not all attacks involve exfiltration activities, but tactics in this category have become more popular with the rise of ransomware double extortion attacks. You can find a more detailed description of double extortion ransomware attacks at https://www.zscaler.com/resources/security-terms-glossary/what-is-double-extortion-ransomware. These tactics aim to copy data out of the victim’s environment to an attacker-controlled infrastructure.
  • Impact: This tactic falls within the action on objectives category of the Unified Kill Chain. At this point, the threat actor can take steps to complete their attack. For example, in the case of a ransomware attack, the large-scale encryption of data would fall into this phase.

We encourage you to explore the MITRE ATT&CK framework in full at https://attack.mitre.org/. In this book, we are specifically going to focus on the Enterprise ATT&CK framework, but MITRE also provides frameworks for ICS and mobile-based attacks as well. The ATT&CK Navigator, located at https://mitre-attack.github.io/attack-navigator/, is also extremely useful for defenders to quickly search for and qualify tactics.

Most publications documenting incident response observations typically provide kill chain and MITRE ATT&CK tactics, which help defenders understand how to design detections and other preventative controls.

The Pyramid of Pain

Another helpful model for defenders to understand is the Pyramid of Pain. This model, developed by David Bianco, visualizes the relationship between the categories of indicators and the impact of defending each. This impact is expressed as the effort required by the threat actor to modify their attack once an effective defense is implemented for a given indicator category. Figure 1.2 shows the concept of the Pyramid of Pain:

Figure 1.2 – David Bianco’s Pyramid of Pain

Figure 1.2 – David Bianco’s Pyramid of Pain

As we can see, controls designed to operate on static indicators such as domain names, IP addresses, and hash values are trivial for adversaries to evade. For example, modifying the hash of a binary simply involves changing a single bit. It is far more difficult for an adversary to modify their tools, tactics, and procedures (TTPs), which are essentially the foundation of their attack playbook. The gold standard for defensive controls is those that target TTPs. However, these are usually more difficult to implement and require reliable data from protected assets, as well as a deep understanding of the adversary’s tactics and capabilities. Defensive controls designed for static indicators are effective for short-term, tactical defense. You can read David Bianco’s full blog post here: https://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html.

Throughout the remainder of this book, we will frequently reference these concepts. In later chapters, we will illustrate how these models can be used to understand cyberattacks, translate high-level business objectives for defense into detections, and measure coverage against known attacks.

Now that we have gained an understanding of the model for framing cyberattacks, let’s look into the most common types of cyberattacks.

Types of cyberattacks

To detect cyberattacks, detection engineers need to have a base understanding of the attacks that they will face. Some of the most prevalent attacks at the time of writing are summarized here to provide some introductory insight into the attacks we are trying to defend against.

Business Email Compromise (BEC)

The FBI reported receiving a total of 19,954 complaints related to Business Email Compromise (BEC) incidents in 2021. They estimate these complaints represent a cumulative loss of 2.4 billion dollars (USD). The full report can be accessed at https://www.ic3.gov/Media/PDF/AnnualReport/2021_IC3Report.pdf.

BEC attacks target users of the most popular and accessible user collaboration tool available – email. The electronic transfer of funds is a normal part of business operations for many organizations. Threat actors research organizations and identify personnel likely to be involved in correspondence related to the exchange of funds. Having identified a target, the threat actor leverages several techniques to gain access to the target’s mailbox (or someone adjacent from a business process perspective). With this access, the threat actor’s objective pivots to observing email exchanges to understand internal processes. During this time, the threat actor needs to understand the communication flows and key players. In ideal cases, they will identify a third-party contractor whom the organization conducts routine business with, the people who typically send correspondence for payments, and the person who approves these payments on behalf of the organization. Once the right opportunity arises, the threat actor can intercept and alter email conversations about payment, changing destination account numbers. If this goes unnoticed, funds may be deposited into the attacker’s account instead of the intended recipient.

Denial of service (DoS)

Denial of service (DoS) attacks attempt to make services unavailable to legitimate users by overwhelming the service or otherwise impairing the infrastructure the service depends on. There are three main types of DoS attacks: volumetric, protocol, and application attacks.

Volumetric attacks are executed by sending an inordinate volume of traffic to a target system. If the attack persists, it can degrade the service or disrupt it entirely. Protocol attacks focus on the network and transport layer and attempt to deplete the available resources of the networking devices, making the target service available. Application attacks send large volumes of requests to a target service. The service attempts to process each request, which consumes processing power on the underlying systems. Eventually, the available resources are exhausted, and service response times increase to the point where the service becomes unavailable. These types of attacks can be further categorized by their degree of automation and the techniques used.

Increasing the number of systems executing the attack can significantly increase the impact. By making use of compromised systems, threat actors can conduct synchronized DoS attacks against a single target, known as distributed denial of service (DDoS) attacks.

Malware outbreak

When malicious software, or malware, manages to evade defensive controls, the impact can range broadly, depending on the specific malware family. In low-impact cases, an end user may be bombarded with unsolicited pop-up ads, and in more extreme scenarios, malware can give full control of a system to a remote threat actor. The presence of malware in an enterprise environment usually indicates a possible deficiency in security controls. Seemingly low-impact malware infections can lead to more significant incidents, including full-blown ransomware attacks.

Insider threats

Employees of an organization who perform malicious activity against that organization are known as insider threats. Insider threats can exist at any level of the organization and have various motivations. Malicious insiders can be difficult to defend against since the organization has granted them a degree of trust.

Phishing

Phishing attacks fall under the category of social engineering, where threat actors design attacks around communication and collaboration tools, such as email, instant messaging apps, SMS text messages, and even regular phone calls. The underlying objective in all cases is to entice users to reveal sensitive information, such as credentials or banking information. BEC attacks typically leverage phishing techniques.

Ransomware

While the threat landscape is full of countless actors, with diverse goals ranging from stealthy cyber espionage to tech-support scams, the most prolific and impactful of these is the modern ransomware attack.

The goal of a ransomware attack is to interrupt critical business operations by taking critical systems offline and demanding payment, or a ransom, from the organization. In exchange for a successful payment, the threat actors claim they will return systems to a normal operating state.

Recently, some ransomware operators have added a separate extortion component to their playbook. During their ransomware attack, they exfiltrate sensitive data from the organization’s environment to attacker-controlled systems. Ransomware operators then threaten to publicize this data unless the ransom is paid. This attack is commonly referred to as the double-extortion ransomware attack.

Successful ransomware operations put businesses in a frightening predicament. Apart from untangling the deep complexities of determining whether to pay the ransom, recovering from a successful cyberattack can take months or sometimes years.

These malicious operations have become increasingly sophisticated and successful over time. According to CrowdStrike, the first instance of modern ransomware was recorded in 2005. Between then and now, the frequency, scale, and sophistication of ransomware attacks have only increased. CrowdStrike’s History of Ransomware article provides a summary of the evolution of ransomware. You can read the full article here: https://www.crowdstrike.com/cybersecurity-101/ransomware/history-of-ransomware/.

The motivation for detection engineering

Successful breaches can have expensive impacts, requiring thousands of man-hours to remediate. IBM’s 2022 Cost of a Data Breach report found that the average total cost of a data breach amounted to 4.35 million USD. Typically, the earlier a threat is detected, the lower the cost of remediation. For every phase that an attacker advances through the kill chain, the cost of remediation goes up. While a threat hunt allows an organization to search for an adversary already inside its environment, the identification occurs when and if a search is performed. This detection, though, allows an organization to identify malicious behavior when the activity is performed, reducing the mean time to detect. Given that the same IBM Cost of a Data Breach report determined that the average time to identify and contain a breach was 277 days, there is much work to be done in attempting to reduce the time to detection.

To understand how the time to detect an attack greatly determines the impact on the business, let’s consider a scenario where a threat actor can gain initial access to an internet-connected workstation via a successful phishing campaign. This unauthorized access was immediately detected by the organization’s security team. They quickly isolated this workstation and performed a full re-imaging of its contents to a known-good state. They also performed a full reset of the user’s credentials, along with any other user who interacted with that workstation. Administrators identified the phishing email in their enterprise email solution, and all recipients had their workstations re-imaged and their credentials reset.

In this scenario, the steps that were taken by the security team were relatively simple to execute and would likely be sufficient to remove the threat from the environment. In contrast, if the threat actors were able to gain privileged access, exfiltrate data, and then deploy ransomware across all systems, the task becomes significantly more onerous. The security team would be faced with the dual task of understanding what happened while simultaneously advising on the best way to restore the business’s ability to operate safely. The following table summarizes how the number of assets impacted, the investigative requirements, and typical remediation efforts change across the Unified Kill Chain goals:

Initial Foothold

Network Propagation

Action on Objectives

Assets impacted

Low value.

Typically, this involves edge devices, public-facing servers, or user workstations. Because of their position in modern architectures, these devices are typically untrusted by default.

Medium value.

Some internal systems. Typically at this phase, the threat actor has access to some member servers within the environment and has a reliable C2 channel established.

High value.

Critical servers such as Active Directory domain controllers, backup servers, or file servers.

Threat
actor’s degree of control

Low.

The threat actor has unreliable access to a system or is attempting to obtain access to a system, typically through phishing or attacking publicly facing services. Typically, this phase is the best opportunity for defenders to remove a threat.

Medium.

The threat actor has enough control to traverse the network, but not enough control to execute objectives. At this point, threat actors typically have some credentials and have a reliable C2 channel established.

High.

The threat actor is fully comfortable operating in the environment. They found all the resources needed to execute their objectives. At this point, they likely have the highest level of privileges available in the environment.

Data
requirement for investigation

Relatively low.

Typically, impact at this phase is limited to a small number of assets. Once identified at this phase, the data required for fully scoping the event is limited to a single host.

Significant.

The capability to traverse the internal network typically indicates the presence of a reliable C2 channel. A higher volume of historic and real-time data is required to identify impacted assets. At this point, incident responders will need to have visibility of all connected assets to fully track lateral movement.

High.

Investigators will require access to historical and real-time data from all connected assets. Additionally, in cases where data exfiltration is an objective, telemetry for the access and movement of data will also be required. This data is difficult to collect and is not typically tracked.

Effort
required to remediate

Low.

Activities at this phase typically occur on edge devices or public-facing assets. The typical posture is to treat these assets as untrusted, so it is common for environments to have capabilities for rapidly isolating these assets.

Medium.

Traversing the network requires more investigative work to identify the individual assets that were accessed, the degree to which they were utilized, and the requirements for remediating.

High.

In nearly every case, this requires rebuilding critical infrastructure. Often, this needs to occur with the added pressure of returning the business to a minimally operational state, to minimize losses.

Table 1.2 – Generalized asset impact and effort versus kill chain goals

It’s plain to see the importance of finding out about cyberattacks in your environment and, more so, the importance of finding out as early as possible. The right person needs to get the relevant information about cyberattacks in a timely fashion. This is the primary objective of detection engineering.

Defining detection engineering

Quickly identifying, qualifying, and mitigating potential security incidents is a top priority for security teams. Identifying potential security incidents quickly is a fairly complicated problem to solve. In general terms, security personnel need to be able to do the following:

  1. Collect events from assets that require protection, as well as assets that can indirectly impact them.
  2. Identify events that may indicate a security incident, ideally as soon as they happen.
  3. Understand the impact of the potential incident.
  4. Communicate the high-value details of the event to all relevant teams for investigation and mitigation.
  5. Receive feedback from investigative teams to determine how the whole process can be improved.

Each of these steps can be difficult to execute within small environments. The complexity increases radically for any increase in the size of a managed environment.

Detection engineering definition

Detection engineering can be defined as a set of processes that enable potential threats to be detected within an environment. These processes encompass the end-to-end life cycle, from collecting detection requirements, aggregating system telemetry, and implementing and maintaining detection logic to validating program effectiveness.

To accomplish these goals, a good detection engineering program typically needs to implement four main processes:

  • Discovery: This involves collecting detection requirements. Here, you must determine whether the requirements are met with existing detections. You must also determine the criticality of the detection, as well as the audiences and timeframes for alerting.
  • Design, development, and testing: The detection requirement is interpreted, and a plan for implementing the detection is formulated. The designed detection is implemented first in a test environment and tested to ensure it produces the expected results.
  • Implementation and post-implementation monitoring: Detection is implemented in the production detection environment. Here, the performance of the detection and the detection systems is monitored.
  • Validation: Routine testing to determine the effectiveness of the detection engineering program as a whole:
Figure 1.3 – The detection engineering processes

Figure 1.3 – The detection engineering processes

Chapter 2, The Detection Engineering Life Cycle, takes a deeper dive into each of these processes.

Important distinctions

Detection engineering can be misunderstood, partly because some processes overlap with other functions within a security organization. We can clarify detection engineering’s position with the following distinctions:

  • Threat hunting: The threat hunting process proactively develops investigative analyses based on a hypothesis that assumes a successful, undetected breach. The threat hunting process can identify active threats in the environment that managed to evade current security controls. This process provides input to the detection engineering program as it can identify deficiencies in detections. The data that’s available to detection engineering is typically the same data that threat hunters utilize. Therefore, threat hunting can also identify deficiencies in the existing data collection infrastructure that will need to be solved and integrated with the detection infrastructure.
  • Security operations center (SOC) operations: SOC teams typically focus on monitoring the security environment, whereas detection engineering provides inputs to SOC teams. While the SOC consumes the products of the detection engineering functions, they typically work very closely with them to provide feedback for detection or collection improvements.
  • Data engineering: Data engineers design, implement, and maintain systems to collect, transform, and distribute data, typically to satisfy data analytics and business intelligence requirements. This aligns with several goals of detection engineering; however, the detection engineering program is heavily security-focused and relies on data engineering to produce the data it needs to build detections.

In this section, we examined some basic cyber security concepts that will be useful throughout this book as we dive into the detection engineering process. Furthermore, we established a definition for detection engineering. With this definition in mind, the following section will examine the value that a detection engineering program brings to an organization.