Book Image

Practical Threat Intelligence and Data-Driven Threat Hunting

By : Valentina Costa-Gazcón
Book Image

Practical Threat Intelligence and Data-Driven Threat Hunting

By: Valentina Costa-Gazcón

Overview of this book

Threat hunting (TH) provides cybersecurity analysts and enterprises with the opportunity to proactively defend themselves by getting ahead of threats before they can cause major damage to their business. This book is not only an introduction for those who don’t know much about the cyber threat intelligence (CTI) and TH world, but also a guide for those with more advanced knowledge of other cybersecurity fields who are looking to implement a TH program from scratch. You will start by exploring what threat intelligence is and how it can be used to detect and prevent cyber threats. As you progress, you’ll learn how to collect data, along with understanding it by developing data models. The book will also show you how to set up an environment for TH using open source tools. Later, you will focus on how to plan a hunt with practical examples, before going on to explore the MITRE ATT&CK framework. By the end of this book, you’ll have the skills you need to be able to carry out effective hunts in your own environment.
Table of Contents (21 chapters)
Section 1: Cyber Threat Intelligence
Section 2: Understanding the Adversary
Section 3: Working with a Research Environment
Section 4: Communicating to Succeed
Appendix – The State of the Hunt

The collection process

Once the IR have been defined, we can proceed with collecting the raw data we need to fulfill them. For this process, we can consult two types of sources: internal sources (such as networks and endpoints) and external sources (such as blogs, threat intelligence feeds, threat reports, public databases, forums, and so on).

The most effective way to carry on the collection process is to use a collection management framework (CMF). Using a CMF allows you to identify data sources and easily track the type of information you are gathering for each. It can also be of use to rate the data that's been obtained from the source, including how long that data has been stored and to track how trustworthy and complete the source is. It is advised that you use the CMF to track not only the external sources, but also the internal ones. Here's an example of what one would look like:

Figure 1.5 – Simple CMF example

Figure 1.5 – Simple CMF example

Dragos analysts Lee, Miller, and Stacey wrote an interesting paper (|27c19e1c-0374-490d-92f9-b9dcf071f9b5) about using a CMF to explore different methodologies and examples. Another great resource available that can be used to design an advanced collection process is the Collection Management Implementation Framework (, designed by the Software Engineering Institute.

Indicators of compromise

So far, we've talked about finding the IR and how to use a CMF. But what data are we going to collect?

An indicator of compromise (IOC), as the name suggests, is an artifact that's been observed in a network or in an operating system that, with high confidence, indicates that it has been compromised. This forensic data is used to understand what happened, but if collected properly, it can also be used to prevent or detect ongoing breaches.

Typical IOCs may include hashes of malicious files, URLs, domains, IPs, paths, filenames, Registry keys, and malware files themselves.

It is important to remember that, in order to be really useful, it is necessary to provide context for the IOCs that have been collected. Here, we can follow the mantra quality over quantity – a huge amount of IOCs does not always mean better data.

Understanding malware

Malware, short for malicious software, is not everything, but it can be an incredibly valuable source of information. Before we look at the different types of malware, it is important for us to understand how malware typically works. Here, we need to introduce two concepts: the dropper and the Command and Control (C2 or C2C).

A dropper is a special type of software designed to install a piece of malware. We will sometimes talk about single-staged and two-stage droppers, depending on whether or not the malware code is contained in the dropper. When the malicious code is not contained within the dropper, it will be downloaded to the victim's device from an external source. Some security researchers may call this two-stage type of dropper a downloader, while referring to a two-stage dropper as the one that requires further steps to put different pieces of code together (by decompressing or executing different pieces of code) to build a final piece of malware.

The Command and Control (C2) is an attacker-controlled computer server that's used to send commands to the malware running in the victim's systems. It's the way the malware communicates with its "owner." There are multiple ways that a C2 can be established and, depending on the malware's capabilities, the complexity of the commands and the communication that can be established may vary. For example, threat actors have been seen using cloud-based services, emails, blog comments, GitHub repositories, and DNS queries, among other things, for C2 communication.

There are different types of malware according to their capabilities, and sometimes, one malware piece can be classified as more than one type. The following is a list of the most common ones:

  • Worm: An autonomous program capable of replicating and propagating itself through the network.
  • Trojan: A program that appears to serve a designated purpose, but also has a hidden malicious capability to bypass security mechanisms, thus abusing the authorization that's been given to it.
  • Rootkit: A set of software tools with administrator privileges, designed to hide the presence of other tools and hide their activities.
  • Ransomware: A computer program designed to deny access to a system or its information until a ransom has been paid.
  • Keylogger: Software or hardware that records keyboard events without the user's knowledge.
  • Adware: Malware that offers the user specific advertising.
  • Spyware: Software that has been installed onto a system without the knowledge of the owner or the user, with the intention of gathering information about him/her and monitoring his/her activity.
  • Scareware: Malware that tricks computer users into visiting compromised websites.
  • Backdoor: The method by which someone can obtain administrator user access in a computer system, a network, or a software application.
  • Wiper: Malware that erases the hard drive of the computer it infects.
  • Exploit kit: A package that's used to manage a collection of exploits that could use malware as a payload. When a victim visits a compromised website, it evaluates the vulnerabilities in the victim's system in order to exploit certain vulnerabilities.

A malware family references a group of malicious software with common characteristics and, most likely, the same author. Sometimes, a malware family can be directly related to a specific threat actor. Sometimes, malware (or a tool) is shared among different groups. This happens a lot with open source malware tools that are publicly available. Leveraging them helps the adversary disguise its identity.

Now let's take a quick look to how we can collect data around pieces of malware.

Using public sources for collection – OSINT

Open Source Intelligence (OSINT) is the process of collecting publicly available data. The most common sources that come to mind when talking about OSINT are social media, blogs, news, and the dark web. Essentially, any data that's made publicly available can be used for OSINT purposes.

Important Note

There are many great resources for someone looking to start collecting information: VirusTotal (, CCSS Forum (, and URLHaus ( are great places to get started with the collection process.

Also, take a look at ( to learn more about OSINT resources and techniques.


A honeypot is a decoy system that imitates possible targets of attacks. A honeypot can be set up to detect, deflect, or counteract an attacker. All traffic that's received is considered malicious and every interaction with the honeypot can be used to study the attacker's techniques.

There are many types of honeypots (an interesting list can be found here:, but they are mostly divided into three categories: low interaction, medium interaction, and high interaction.

Low interaction honeypots simulate the transport layer and provide very limited access to the operating system. Medium interaction honeypots simulate the application layer in order to lure the attacker into sending the payload. Finally, high interaction honeypots usually involve real operating systems and applications. These ones are better for uncovering the abuse of unknown vulnerabilities.

Malware analysis and sandboxing

Malware analysis is the process of studying the functionality of malicious software. Typically, we can distinguish between two types of malware analysis: dynamic and static.

Static malware analysis refers to analyzing the software that's used without executing it. Reverse engineering or reversing is a form of static malware analysis and is performed using a disassembler such as IDA or the more recent NSA tool, Ghidra, among others.

Dynamic malware analysis is performed by observing the behavior of the malware piece once it's been executed. This type of analysis is usually performed in a controlled environment to avoid infecting production systems.

In the context of malware analysis, a sandbox is an isolated and controlled environment used to dynamically analyze pieces of malware automatically. In a sandbox, the suspected malware piece is executed and its behavior is recorded.

Of course, things are not always this simple, and malware developers implement techniques to prevent the malware from being sandboxed. At the same time, security researchers develop their own techniques to bypass the threat actor's antisandbox techniques. Despite this chase of cat and mouse, sandboxing systems are still a crucial part of the malware analysis process.


There are some great online sandboxing solutions, such as Any Run ( and Hybrid Analysis ( Cuckoo Sandbox ( is an open source and offline sandboxing system for Windows, Linux, macOS, and Android.