Book Image

Python Digital Forensics Cookbook

By : Chapin Bryce, Preston Miller
Book Image

Python Digital Forensics Cookbook

By: Chapin Bryce, Preston Miller

Overview of this book

Technology plays an increasingly large role in our daily lives and shows no sign of stopping. Now, more than ever, it is paramount that an investigator develops programming expertise to deal with increasingly large datasets. By leveraging the Python recipes explored throughout this book, we make the complex simple, quickly extracting relevant information from large datasets. You will explore, develop, and deploy Python code and libraries to provide meaningful results that can be immediately applied to your investigations. Throughout the Python Digital Forensics Cookbook, recipes include topics such as working with forensic evidence containers, parsing mobile and desktop operating system artifacts, extracting embedded metadata from documents and executables, and identifying indicators of compromise. You will also learn to integrate scripts with Application Program Interfaces (APIs) such as VirusTotal and PassiveTotal, and tools such as Axiom, Cellebrite, and EnCase. By the end of the book, you will have a sound understanding of Python and how you can use it to process artifacts in your investigations.
Table of Contents (11 chapters)

Iterating over loose files

Recipe Difficulty: Easy

Python Version: 2.7 or 3.5

Operating System: Any

Often it is necessary to iterate over a directory and its subdirectories to recursively process all files. In this recipe, we will illustrate how to use Python to walk through directories and access files within them. Understanding how you can recursively navigate a given input directory is key as we frequently perform this exercise in our scripts.

Getting started

All libraries used in this script are present in Python's standard library. The preferred library, in most situations, for handling file and folder iteration is the built-in os library. While this library supports many useful operations, we will focus on the os.path() and os.walk() functions. Let’s use the following folder hierarchy as an example to demonstrate how directory iteration works in Python:

SecretDocs/
|-- key.txt
|-- Plans
| |-- plans_0012b.txt
| |-- plans_0016.txt
| `-- Successful_Plans
| |-- plan_0001.txt
| |-- plan_0427.txt
| `-- plan_0630.txt
|-- Spreadsheets
| |-- costs.csv
| `-- profit.csv
`-- Team
|-- Contact18.vcf
|-- Contact1.vcf
`-- Contact6.vcf

4 directories, 11 files

How to do it…

The following steps are performed in this recipe:

  1. Create a positional argument for the input directory to scan.
  2. Iterate over all subdirectories and print file paths to the console.

How it works…

We create a very basic argument handler that accepts one positional input, DIR_PATH, the path of the input directory to iterate. As an example, we will use the ~/Desktop path, the parent of SecretDocs, as the input argument for the script. We parse the command-line arguments and assign the input directory to a local variable. We’re now ready to begin iterating over this input directory:

from __future__ import print_function
import argparse
import os

__authors__ = ["Chapin Bryce", "Preston Miller"]
__date__ = 20170815
__description__ = "Directory tree walker"

parser = argparse.ArgumentParser(
description=__description__,
epilog="Developed by {} on {}".format(
", ".join(__authors__), __date__)
)
parser.add_argument("DIR_PATH", help="Path to directory")
args = parser.parse_args()
path_to_scan = args.DIR_PATH

To iterate over a directory, we need to provide a string representing its path to os.walk(). This method returns three objects in each iteration, which we have captured in the root, directories, and files variables:

  • root: This value provides the relative path to the current directory as a string. Using the example directory structure, root would start as SecretDocs and eventually become SecretDocs/Team and SecretDocs/Plans/SuccessfulPlans.
  • directories: This value is a list of sub-directories located within the current root location. We can iterate through this list of directories, although the entries in this list will become part of the root value during successive os.walk() calls. For this reason, the value is not frequently used.
  • files: This value is a list of files in the current root location.
Be careful in naming the directory and file variables. In Python the dir and file names are reserved for other uses and should not be used as variable names.
# Iterate over the path_to_scan
for root, directories, files in os.walk(path_to_scan):

It is common to create a second for loop, as shown in the following code, to step through each of the files located in that directory and perform some action on them. Using the os.path.join() method, we can join the root and file_entry variables to obtain the file’s path. We then print this file path to the console. We may also, for example, append this file path to a list that we later iterate over to process each of the files:

    # Iterate over the files in the current "root"
for file_entry in files:
# create the relative path to the file
file_path = os.path.join(root, file_entry)
print(file_path)
We can also use root + os.sep() + file_entry to achieve the same effect, but it is not as Pythonic as the method we're using to join paths. Using os.path.join(), we can pass two or more strings to form a single path, such as directories, subdirectories, and files.

When we run the preceding script with our example input directory, we see the following output:

As seen, the os.walk() method iterates through a directory, then will descend into any discovered sub-directories, thereby scanning the entire directory tree.

There's more…

This script can be further improved. Here's a recommendation:

  • Check out and implement similar functionality using the glob library which, unlike the os module, allows for wildcard pattern recursive searches for files and directories