Python Digital Forensics Cookbook

By : Chapin Bryce, Preston Miller

Python Digital Forensics Cookbook

By: Chapin Bryce, Preston Miller

Overview of this book

Technology plays an increasingly large role in our daily lives and shows no sign of stopping. Now, more than ever, it is paramount that an investigator develops programming expertise to deal with increasingly large datasets. By leveraging the Python recipes explored throughout this book, we make the complex simple, quickly extracting relevant information from large datasets. You will explore, develop, and deploy Python code and libraries to provide meaningful results that can be immediately applied to your investigations. Throughout the Python Digital Forensics Cookbook, recipes include topics such as working with forensic evidence containers, parsing mobile and desktop operating system artifacts, extracting embedded metadata from documents and executables, and identifying indicators of compromise. You will also learn to integrate scripts with Application Program Interfaces (APIs) such as VirusTotal and PassiveTotal, and tools such as Axiom, Cellebrite, and EnCase. By the end of the book, you will have a sound understanding of Python and how you can use it to process artifacts in your investigations.

Preface

What this book covers

What you need for this book

Free Chapter

Essential Scripting and File Information Recipes

Introduction

Handling arguments like an adult

Iterating over loose files

Recording file attributes

Copying files, attributes, and timestamps

Hashing files and data streams

Keeping track with a progress bar

Logging results

Multiple hands make light work

Creating Artifact Report Recipes

Introduction

Using HTML templates

Creating a paper trail

Working with CSVs

Visualizing events with Excel

Auditing your work

A Deep Dive into Mobile Forensic Recipes

Introduction

Parsing PLIST files

Handling SQLite databases

Identifying gaps in SQLite databases

Processing iTunes backups

Putting Wi-Fi on the map

Digging deep to recover messages

Extracting Embedded Metadata Recipes

Introduction

Extracting audio and video metadata

The big picture

Mining for PDF metadata

Reviewing executable metadata

Reading office document metadata

Integrating our metadata extractor with EnCase

Networking and Indicators of Compromise Recipes

Introduction

Getting a jump start with IEF

Coming into contact with IEF

Beautiful Soup

Going hunting for viruses

Gathering intel

Totally passive

Reading Emails and Taking Names Recipes

Parsing PST and OST mailboxes

Log-Based Artifact Recipes

Introduction

About time

Parsing IIS web logs with RegEx

Going spelunking

Interpreting the daily.out log

Adding daily.out parsing to Axiom

Scanning for indicators with YARA

Working with Forensic Evidence Container Recipes

Introduction

Opening acquisitions

Gathering acquisition and media information

Iterating through files

Processing files within the container

Searching for hashes

Exploring Windows Forensic Artifacts Recipes - Part I

Introduction

One man's trash is a forensic examiner's treasure

A sticky situation

Reading the registry

Gathering user activity

The missing link

Searching high and low

Exploring Windows Forensic Artifacts Recipes - Part II

Introduction

Parsing prefetch files

A series of fortunate events

Indexing internet history

Shadow of a former self

Dissecting the SRUM database

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Hashing files and data streams

Recipe Difficulty: Easy

Python Version: 2.7 or 3.5

Operating System: Any

File hashes are a widely accepted identifier for determining file integrity and authenticity. While some algorithms have become vulnerable to collision attacks, the process is still important in the field. In this recipe, we will cover the process of hashing a string of characters and a stream of file content.

Getting started

All libraries used in this script are present in Python’s standard library. For generating hashes of files and other data sources, we implement the hashlib library. This built-in library has support for common algorithms, such as MD5, SHA-1, SHA-256, and more. As of the writing of this book, many tools still leverage the MD5 and SHA-1 algorithms, though the current recommendation is to use SHA-256 at a minimum. Alternatively, one could use multiple hashes of a file to further decrease the odds of a hash collision. While we'll showcase a few of these algorithms, there are other, less commonly used, algorithms available.

To learn more about the hashlib library, visit https://docs.python.org/3/library/hashlib.html.

How to do it…

We hash files with the following steps:

Print hashed filename using the specified input file and algorithm.
Print hashed file data using the specified input file and algorithm.

How it works…

To begin, we must import hashlib as shown in the following. For ease of use, we have defined a dictionary of algorithms that our script can use: MD5, SHA-1, SHA-256 and SHA-512. By updating this dictionary, we can support other hash functions that have update() and hexdigest() methods, including some from libraries other than hashlib:

from __future__ import print_function
import argparse
import hashlib
import os

__authors__ = ["Chapin Bryce", "Preston Miller"]
__date__ = 20170815
__description__ = "Script to hash a file's name and contents"

available_algorithms = {
    "md5": hashlib.md5,
    "sha1": hashlib.sha1,
    "sha256": hashlib.sha256,
    "sha512": hashlib.sha512
}

parser = argparse.ArgumentParser(
    description=__description__,
    epilog="Developed by {} on {}".format(", ".join(__authors__), __date__)
)
parser.add_argument("FILE_NAME", help="Path of file to hash")
parser.add_argument("ALGORITHM", help="Hash algorithm to use",
                    choices=sorted(available_algorithms.keys()))
args = parser.parse_args()

input_file = args.FILE_NAME
hash_alg = args.ALGORITHM

Notice how we define our hashing algorithm object using our dictionary and the argument provided at the command line, followed by open and close parentheses to initiate the object. This provides additional flexibility when adding new hashing algorithms.

With our hash algorithms defined, we now can hash the file's absolute path, a similar method employed during file naming for iTunes backups of an iOS device, by passing the string into the update() method. When we are ready to display the hex value of the calculated hash, we can call the hexdigest() method on our file_name object:

file_name = available_algorithms[hash_alg]()
abs_path = os.path.abspath(input_file)
file_name.update(abs_path.encode())

print("The {} of the filename is: {}".format(
    hash_alg, file_name.hexdigest()))

Let's move onto opening the file and hashing its contents. While we can read the entire file and pass it to the hash function, not all files are small enough to fit in memory. To ensure our code works on larger files, we will use the technique in the following example to read a file in a piecemeal fashion and hash it in chunks.

By opening the file as rb, we will ensure that we are reading the binary contents of the file, not the string content that may exist. With the file open, we will define the buffer size to read in content and then read the first chunk of data in.

Entering a while loop, we will update our hashing object with the new content for as long as there is content in the file. This is possible as the read() method allows us to pass an integer of the number of bytes to read and, if the integer is larger than the number of bytes remaining in the file, will simply pass us the remaining bytes.

Once the entire file is read, we call the hexdigest() method of our object to display the file hash to the examiner:

file_content = available_algorithms[hash_alg]()
with open(input_file, 'rb') as open_file:
    buff_size = 1024
    buff = open_file.read(buff_size)

    while buff:
        file_content.update(buff)
        buff = open_file.read(buff_size)

print("The {} of the content is: {}".format(
    hash_alg, file_content.hexdigest()))

When we execute the code, we see the output from the two print statements revealing the hash value of the file's absolute path and content. We can generate additional hashes for the file by changing the algorithm at the command line:

There's more…

This script can be further improved. Here's a recommendation:

Add support for additional hashing algorithms and create the appropriate entry within the available_algorithms global variable

Python Digital Forensics Cookbook

By : Chapin Bryce, Preston Miller

Python Digital Forensics Cookbook

By: Chapin Bryce, Preston Miller

Overview of this book

Related Content you might be interested in

Current Title:

Python Digital Forensics Cookbook

Learning Python for Forensics.

Windows Forensics Analyst Field Guide