Learning Python for Forensics

Learning Python for Forensics

By : Chapin Bryce

Buy this Book

Learning Python for Forensics

By: Chapin Bryce

Buy this Book

Overview of this book

This book will illustrate how and why you should learn Python to strengthen your analysis skills and efficiency as you creatively solve real-world problems through instruction-based tutorials. The tutorials use an interactive design, giving you experience of the development process so you gain a better understanding of what it means to be a forensic developer. Each chapter walks you through a forensic artifact and one or more methods to analyze the evidence. It also provides reasons why one method may be advantageous over another. We cover common digital forensics and incident response scenarios, with scripts that can be used to tackle case work in the field. Using built-in and community-sourced libraries, you will improve your problem solving skills with the addition of the Python scripting language. In addition, we provide resources for further exploration of each script so you can understand what further purposes Python can serve. With this knowledge, you can rapidly develop and deploy solutions to identify critical information and fine-tune your skill set as an examiner.

Learning Python for Forensics

Credits

About the Authors

Acknowledgments

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Now For Something Completely Different

When to use Python?

Getting started

Standard data types

Data type conversions

Files

Variables

Understanding scripting flow logic

Functions

Summary

Python Fundamentals

Advanced data types and functions

Libraries

Classes and object-oriented programming

Try and except

Creating our first script – unix_converter.py

User input

Forensic scripting best practices

Developing our first forensic script – usb_lookup.py

Troubleshooting

Challenge

Summary

Parsing Text Files

Setup API

Introducing our script

Our first iteration – setupapi_parser.v1.py

Our second iteration – setupapi_parser.v2.py

Our final iteration – setupapi_parser.py

Additional challenges

Summary

Working with Serialized Data Structures

Serialized data structures

A simple Bitcoin Web API

Our first iteration – bitcoin_address_lookup.v1.py

Our second iteration – bitcoin_address_lookup.v2.py

Mastering our final iteration – bitcoin_address_lookup.py

Summary

Databases in Python

An overview of databases

Using SQLite3

Designing our script

Manually manipulating databases with Python – file_lister.py

Further automating databases – file_lister_peewee.py

Challenge

Summary

Extracting Artifacts from Binary Files

UserAssist

Working with the Registry module

Introducing the Struct module

Creating spreadsheets with the xlsxwriter module

The UserAssist framework

Running the UserAssist framework

Additional challenges

Summary

Fuzzy Hashing

Background on hashing

Using SSDeep in Python – ssdeep_python.py

Additional challenges

Citations

Summary

The Media Age

Creating frameworks in Python

Introduction to EXIF metadata

Introduction to ID3 metadata

Introduction to Office metadata

Metadata_Parser framework overview

Parsing EXIF metadata – exif_parser.py

Parsing ID3 metdata – id3_parser.py

Parsing Office metadata – office_parser.py

Moving on to our writers

Framework summary

Additional challenges

Summary

Uncovering Time

About timestamps

Using a GUI

Developing the Date Decoder GUI – date_decoder.py

Additional challenges

Summary

Did Someone Say Keylogger?

A detailed look at keyloggers

Building a keylogger for Windows

Multiprocessing in Python – simple_multiprocessor.py

Running Python without a command window

Exploring the code

Citations

Additional challenges

Summary

Parsing Outlook PST Containers

The Personal Storage Table File Format

An introduction to libpff

Exploring PSTs – pst_indexer.py

Running the script

Additional challenges

Summary

Recovering Transient Database Records

SQLite WAL files

Regular expressions in Python

TQDM – a simpler progress bar

Parsing WAL files – wal_crawler.py

Executing wal_crawler.py

Challenge

Summary

Coming Full Circle

Frameworks

Colorama

FIGlet

Exploring the framework – framework.py

Summary

Installing Python

Python for Windows

Python for OS X and Linux

Python Technical Details

The Python installation folder

Troubleshooting Exceptions

IOError

UnicodeEncodeError and UnicodeDecodeError

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

When to use Python?

Python is a powerful forensic tool, but before deciding to develop a script it is important to consider the type of analysis required and the project timeline. In the following examples, we will outline situations when Python is invaluable and, conversely, when it is not worth the development effort. Though rapid development makes it easy to deploy a solution in a tough situation, Python is not always the best tool to implement it. If a tool exists that performs the task, and is available, it can be the preferred method for analysis.

Python is a preferred programming language for forensics due to its ease of use, library support, detailed documentation, and interoperability among operating systems. There are two main types of programming languages, those that are interpreted and those that are compiled. Compiling the code allows the programming language to be converted into a machine language. This lower level language is more efficient for the computer to interpret. Interpreted languages are not as fast as the compiled languages at run time and they do not require compilation, which can take some time. As Python is an interpreted language, we can make modifications to our code and quickly run and view the results. With a compiled language, we would have to wait for our code to re-compile before viewing the effect of our modifications. For this reason, Python may not run as quickly as a compiled language; however, it allows rapid prototyping.

An incident response case presents an excellent example of when to use Python in a case setting. For example, let us consider that a client calls, panicked, reporting a data breach and unsure of how many files were exfiltrated over the past 24 hours from their file server. Once on site, you are instructed to perform the fastest count of files accessed in the past 24 hours. This count, along with a list of compromised files, will determine the course of action.

Python fits this bill quite nicely. Armed with just a laptop, you can open a text editor and begin writing a code solution. Python can be built and designed without the need of a fancy editor or tool set. The build process of your script may look similar to this, with each step building upon the previous:

Make the script read a single file's last accessed time stamp.
Write a loop that steps through directories and subdirectories.
Test each file to see if the timestamp is within the past 24 hours.
If it has been accessed within 24 hours then create a list of affected files to display file paths and access times.

The preceding process would result in a script that recurses over the entire server and the output files found with a last accessed time in the past 24 hours for manual review. This script will be approximately 20 lines of code and might require ten minutes, or less, for an intermediate scripter to develop and validate (it is apparent that this will be more efficient than manually reviewing the timestamps on the filesystem).

Before deploying any developed code, it is imperative that you validate its capability first. As Python is not a compiled language, we can easily run the script after adding new lines of code to ensure that we haven't broken anything. This approach is known as test-then-code, a method commonly used in script development. Any software, regardless of who wrote it, should be scrutinized and evaluated to ensure accuracy and precision. Validation ensures that the code is operating properly. Although it is more time consuming, validated code provides reliable results capable of withstanding the courtroom, which is an important aspect in forensics.

A situation where Python might not be the best tool is for general case analysis. If you are handed a hard drive and asked to find evidence without additional insight, then a preexisting tool will be a better solution. Python is invaluable for targeted solutions, such as analyzing a given file type and creating a metadata report. Developing a custom all-in-one solution for a given filesystem requires too much time to create, when other tools, both paid and free, support such generic analysis.

Python is useful in pre-processing automation. If you find yourself repeating the same task for each evidence item, it may be worthwhile to develop a system that automates those steps. A great example of suites that perform such analysis is ManTech's Analysis and Triage System (MantaRay^[1]), which leverages a series of tools to create general reports that can speed up analysis when there is no scope of what data may exist.

When considering whether to commit resources to develop Python scripts, either on the fly or for larger projects, it is important to consider what solutions already exist, the time available to create a solution, and the time saved through automation. Despite the best intentions, the development of solutions can go on for much longer than initially conceived without a strong design plan.

Development life cycle

The development life cycle involves at least five steps:

Identify
Plan
Program
Validate
Bugs

The first step is self-explanatory: before you develop, you must identify the problem that needs to be solved. Planning is perhaps the most crucial step in the development cycle.

Good planning will help you by decreasing the amount of code required and the number of bugs. Planning becomes even more vital during the learning process. A Forensic programmer must begin to answer the following questions: how will data be ingested, what Python data types are most appropriate, are third party libraries necessary, and how will the results be displayed to the examiner? In the beginning, just as we were writing a term paper, it is a good idea to write, or draw, an outline of your program. As you become more proficient in Python, planning will become a second nature, but initially it is recommended that you create an outline or write a pseudocode.

A pseudocode is an informal way of writing code before filling in the details with actual code. Pseudocode can represent the barebones of the program, such as defining pertinent variables and functions while describing how they will all fit together within the script's framework. Pseudocode for a function might look like the following example:

# open the database
# read from the database using the sqlite3 library – store in variable called records
    for record in records:
        # process database records here

After identifying and planning, the next three steps make up the largest part of the development cycle. Once your program is sufficiently planned, it is time to start writing the code! Once the code is written, break into your new program with as much test data as possible. Especially in forensics, it is critical to thoroughly test your code instead of relying on the results of one example. Without comprehensive debugging, the code can crash when it encounters something unexpected, or, even worse, it could provide the examiner with false information and lead them down the wrong path. After the code is tested, it is time to release it and prepare for bug reports. We are not just talking about insects! Despite a programmer's best efforts, there will always be bugs in the code. Bugs have a nasty way of multiplying even when you squash one, perpetually causing the programming cycle to begin repeatedly.

Learning Python for Forensics

By : Chapin Bryce

Learning Python for Forensics

By: Chapin Bryce

Overview of this book

Related Content you might be interested in

Current Title:

Learning Python for Forensics

When to use Python?