Book Image

Learning Python for Forensics - Second Edition

By : Miller, Chapin Bryce
Book Image

Learning Python for Forensics - Second Edition

By: Miller, Chapin Bryce

Overview of this book

Digital forensics plays an integral role in solving complex cybercrimes and helping organizations make sense of cybersecurity incidents. This second edition of Learning Python for Forensics illustrates how Python can be used to support these digital investigations and permits the examiner to automate the parsing of forensic artifacts to spend more time examining actionable data. The second edition of Learning Python for Forensics will illustrate how to develop Python scripts using an iterative design. Further, it demonstrates how to leverage the various built-in and community-sourced forensics scripts and libraries available for Python today. This book will help strengthen your analysis skills and efficiency as you creatively solve real-world problems through instruction-based tutorials. By the end of this book, you will build a collection of Python scripts capable of investigating an array of forensic artifacts and master the skills of extracting metadata and parsing complex data structures into actionable reports. Most importantly, you will have developed a foundation upon which to build as you continue to learn Python and enhance your efficacy as an investigator.
Table of Contents (15 chapters)

Standard data types

With our first script complete, it is time to understand the basic data types of Python. These data types are similar to those found in other programming languages, but are invoked with a simple syntax, which is described in the following table and sections. For a full list of standard data types available in Python, visit the official documentation at https://docs.python.org/3/library/stdtypes.html:

Data Type
Description
Example

Str

String

str(), "Hello", 'Hello'

Unicode

Unicode characters

unicode(), u'hello', "world".encode('utf-8')

Int

Integer

int(), 1, 55

Float

Decimal precision integers

float(), 1.0, .032

Bool

Boolean values

bool(), True, False

List

List of elements

list(), [3, 'asd', True, 3]

Dictionary

Set of key:value pairs used to structure data

dict(), {'element': 'Mn', 'Atomic Number': 25, 'Atomic Mass': 54.938}

Set

List of unique elements

set(), [3, 4, 'hello']

Tuple

Organized list of elements

tuple(), (2, 'Hello World!', 55.6, ['element1'])

File

A file object

open('write_output.txt', 'w')

We are about to dive into the usage of data types in Python, and recommend that you repeat this section as needed to help with comprehension. While reading through how data types are handled is important, please be at a computer where you can run Python when you work through it the first few times. We invite you to explore the data type further in your interpreter and test them to see what they are capable of.

You will find that most of our scripts can be accomplished using only the standard data types Python offers. Before we take a look at one of the most common data types, strings, we will introduce comments.

Something that is always said, and can never be said enough, is to comment your code. In Python, comments are formed by any line beginning with the pound, or more recently known as the hashtag, # symbol. When Python encounters this symbol, it skips the remainder of the line and proceeds to the next line. For comments that span multiple lines, we can use three single or double quotes to mark the beginning and end of the comments rather than using a single pound symbol for every line. What follows are examples of types of comments in a file called comments.py. When running this script, we should only see 10 printed to the console as all comments are ignored:

# This is a comment
print(5 + 5) # This is an inline comment.
# Everything to the right of the # symbol
# does not get executed
"""We can use three quotes to create
multi-line comments."""

The output is as follows:

When this code is executed, we only see the preceding at the console.

Strings and Unicode

Strings are a data type that contain any character, including alphanumeric characters, symbols, Unicode, and other codecs. With the vast amount of information that can be stored as a string, it is no surprise they are one of the most common data types. Examples of areas where strings are found include reading arguments at the command line, user input, data from files, and outputting data. To begin, let us look at how we can define a string in Python.

There are three ways to create a string: with single quotes, double quotes, or with the built-in str() constructor method. Note that there is no difference between single- and double-quoted strings. Having multiple ways to create a string is advantageous, as it allows us to differentiate between intentional quotes within a string. For example, in the 'I hate when people use "air-quotes"!' string, we use the single quotes to demarcate the beginning and end of the main string. The double quotes inside the string will not cause any issues with the Python interpreter. Let's verify with the type() function that both single and double quotes create the same type of object:

>>> type('Hello World!')
<class 'str'>
>>> type("Foo Bar 1234")
<class 'str'>

As we saw with comments, a block string can be defined by three single or double quotes to create multi-line strings. The only difference is whether we do something with the block-quoted value or not:

>>> """This is also a string""" 
This is also a string
>>> '''it
can span
several lines'''
it\ncan span\nseveral lines

The \n character in the returned line signifies a line feed or a new line. The output in the interpreter displays these newline characters as \n, though when fed into a file or console, a new line is created. The \n character is one of the common escape characters in Python. Escape characters are denoted by a backslash following a specific character. Other common escape characters include \t for horizontal tabs, \r for carriage returns, \', \", and \\ for literal single quotes, double quotes, and backslashes, among others. Literal characters allow us to use these characters without unintentionally using their special meaning in Python's context.

We can also use the add (+) or multiply (*) operators with strings. The add operator is used to concatenate strings together, and the multiply operator will repeat the provided string values:

>>> 'Hello' + ' ' + 'World'
Hello World
>>> "Are we there yet? " * 3
Are we there yet? Are we there yet? Are we there yet?

Let's look at some common functions we use with strings. We can remove characters from the beginning or end of a string using the strip() function. The strip() function requires the character we want to remove as its input, otherwise it will replace whitespace by default. Similarly, the replace() function takes two inputs the character to replace and what to replace it with. The major difference between these two functions is that strip() only looks at the beginning and end of a string:

# This will remove colon (`:`) from the beginning and end of the line
>>> ':HelloWorld:'.strip(':')
HelloWorld


# This will remove the colon (`:`) from the line and place a
# space (` `) in it's place
>>> 'Hello:World'.replace(':', ' ')
Hello World

We can check if a character or characters are in a string using the in statement. Or, we can be more specific, and check if a string startswith() or endswith() a specific character(s) instead (you know a language is easy to understand when you can create sensible sentences out of functions). These methods return True or False Boolean objects:

>>> 'a' in 'Chapter 2'
True
>>> 'Chapter 1'.startswith('Chapter')
True
>>> 'Chapter 1'.endswith('1')
True

We can quickly split a string into a list based on some delimiter. This can be helpful to quickly convert data separated by a delimiter into a list. For example, comma-separated values (CSV) data is separated by commas and could be split on that value:

>>> print("Hello, World!".split(','))
["Hello", " World!"]

Formatting parameters can be used on strings to manipulate them and convert them based on provided values. With the .format() function, we can insert values into strings, pad numbers, and display patterns with simple formatting. This chapter will highlight a few examples of the .format() method, and we will introduce more complex features of it throughout this book. The .format() method replaces curly brackets with the provided values in order.

This is the most basic operation for inserting values into a string dynamically:

>>> "{} {} {} {}".format("Formatted", "strings", "are", "easy!")
'Formatted strings are easy!'

Our second example displays some of the expressions we can use to manipulate a string. Inside the curly brackets, we place a colon, which indicates that we are going to specify a format for interpretation. Following this colon, we specify that there should be at least six characters printed. If the supplied input is not six characters long, we prepend zeroes to the beginning of the input. Lastly, the d character specifies that the input will be a base 10 decimal:

>>> "{:06d}".format(42)
'000042'

Our last example demonstrates how we can easily print a string of 20 equal signs by stating that our fill character is the equals symbol, followed by the caret (to center the symbols in the output), and the number of times to repeat the symbol. By providing this format string, we can quickly create visual separators in our outputs:

>>> "{:=^20}".format('')
'===================='
While we will introduce more advanced features of the .format() method, the site https://pyformat.info/ is a great resource for learning more about the capabilities of Python's string formatting.

Integers and floats

The integer is another valuable data type that is frequently used—an integer is any whole positive or negative number. The float data type is similar, but allows us to use numbers requiring decimal-level precision. With integers and floats, we can use standard mathematical operations, such as: +, -, *, and /. These operations return slightly different results based on the object's type (for example, integer or float).

An integer uses whole numbers and rounding, for example dividing two integers will result in another whole number integer. However, by using one float in the equation, even one that has the same value as the integer will result in a float; for example, 3/2=1 and 3/2.0=1.5 in Python. The following are examples of integer and float operations:

>>> type(1010)
<class 'int'>
>>> 127*66
8382
>>> 66/10
6
>>> 10 * (10 - 8)
20

We can use ** to raise an integer by a power. For example, in the following section, we raise 11 by the power of 2. In programming, it can be helpful to determine the numerator resulting from the division between two integers. For this, we use the modulus or percent (%) symbol. With Python, negative numbers are those with a dash character (-) preceding the value. We can use the built-in abs() function to get the absolute value of an integer or float:

>>> 11**2
121
>>> 11 % 2 # 11 divided by 2 is 5.5 or 5 with a remainder of 1
1
>>> abs(-3)
3

A float is defined by any number with a decimal. Floats follow the same rules and operations as we saw with integers, with the exception of the division behavior described previously:

>>> type(0.123)
<class 'float'>
>>> 1.23 * 5.23
6.4329
>>> 27/8.0
3.375

Boolean and none

The integers 0 and 1 can also represent Boolean values in Python. These values are the Boolean False or True objects, respectively. To define a Boolean, we can use the bool() constructor statement. These data types are used extensively in program logic to evaluate statements for conditionals, as covered later in this chapter.

Another built-in data type is the null type, which is defined by the keyword None. When used, it represents an empty object, and when evaluated will return False. This is helpful when initializing a variable that may use several data types throughout execution. By assigning a null value, the variable remains sanitized until reassigned:

>>> bool(0)
False
>>> bool(1)
True
>>> None
>>>

Structured data types

There are several data types that are more complex and allow us to create structures of raw data. This includes lists, dictionaries, sets, and tuples. Most of these structures are comprised of the previously mentioned data types. These structures are very useful in creating powerful units of values, allowing raw data to be stored in a manageable manner.

Lists

Lists are a series of ordered elements. Lists support any data type as an element and will maintain the order of data as it is appended to the list. Elements can be called by position or a loop can be used to step through each item. In Python, unlike other languages, printing a list takes one line. In languages like Java or C++, it can take three or more lines to print a list. Lists in Python can be as long as needed and can expand or contract on the fly, another feature uncommon in other languages.

We can create lists by using brackets with elements separated by commas. Or, we can use the list() class constructor with an iterable object. List elements can be accessed by index where 0 is the first element. To access an element by position, we place the desired index in brackets following the list object. Rather than needing to know how long a list is (which can be accomplished with the len() function), we can use negative index numbers to access list elements in reference to the end (that is, -3 would retrieve the third to last element):

>>> type(['element1', 2, 6.0, True, None, 234])
<class 'list'>
>>> list((4, 'element 2', None, False, .2))
[4, 'element 2', None, False, 0.2]
>>> len([0,1,2,3,4,5,6])
7
>>> ['hello_world', 'foo bar'][0]
hello_world
>>> ['hello_world', 'foo_bar'][-1]
foo_bar

We can add, remove, or check if a value is in a list using a couple of different functions. The append() method adds data to the end of the list. Alternatively, the insert() method allows us to specify an index when adding data to the list. For example, we can add the string fish to the beginning, or 0 index, of our list:

>>> ['cat', 'dog'].append('fish')
# The list becomes: ['cat', 'dog', 'fish']
>>> ['cat', 'dog'].insert(0, 'fish')
# The list becomes: ['fish', 'cat', 'dog']

The pop() and remove() functions delete data from a list either by index or by a specific object, respectively. If an index is not supplied with the pop() function, the last element in the list is popped. Note that the remove() function only gets rid of the first instance of the supplied object in the list:

>>> [0, 1, 2].pop()
2
# The list is now [0, 1]

>>> [3, 4, 5].pop(1)
4
# The list is now [3, 5]
>>> [1, 1, 2, 3].remove(1)
# The list becomes: [1, 2, 3]

We can use the in statement to check if some object is in the list. The count() function tells us how many instances of an object are in the list:

>>> 'cat' in ['mountain lion', 'ox', 'cat']
True
>>> ['fish', 920.5, 3, 5, 3].count(3)
2

If we want to access a subset of elements, we can use list slice notation. Other objects, such as strings, also support this same slice notation to obtain a subset of data. Slice notation has the following format, where a is our list or string object:

a[x:y:z]

In the preceding example, x represents the start of the slice, y represents the end of the slice, and z represents the step of the slice. Note that each segment is separated by colons and enclosed in square brackets. A negative step is a quick way to reverse the contents of an object that supports slice notation and would be triggered by a negative number as z. Each of these arguments is optional. In the first example, our slice returns the second element and up to, but not including, the fifth element in the list. Using just one of these slice elements returns a list containing everything from the second index forward or everything up to the fifth index:

>>> [0,1,2,3,4,5,6][2:5]
[2, 3, 4]
>>> [0,1,2,3,4,5,6][2:]
[2, 3, 4, 5, 6]
>>> [0,1,2,3,4,5,6][:5]
[0, 1, 2, 3, 4]

Using the third slice element, we can skip every other element or simply reverse the list with a negative one. We can use a combination of these slice elements to specify how to carve a subset of data from the list:

>>> [0,1,2,3,4,5,6][::2]
[0, 2, 4, 6]
>>> [0,1,2,3,4,5,6][::-1]
[6, 5, 4, 3, 2, 1, 0]

Dictionaries

Dictionaries, otherwise known as dict, are another common Python data container. Unlike lists, this object does not add data in a linear fashion. Instead, data is stored as key and value pairs, where you can create and name unique keys to act as an index for stored values. It is important to note that, in Python 2, dictionaries do not preserve the order in which items are added to it. This is no longer true as of Python 3.6.5, though in general, we should not rely on the dict() object maintaining order for us. These objects are used heavily in forensic scripting, as they allow us to store data by name in a single object; otherwise, we may be left assigning a lot of new variables. By storing data in dictionaries, it is possible to have one variable contain very structured data.

We can define a dictionary by using curly braces ({}), where each key and value pair is delimited by a colon. Additionally, we can use the dict() class constructor to instantiate dictionary objects. Calling a value from a dictionary is accomplished by specifying the key in brackets following the dictionary object. If we supply a key that does not exist, we will receive a KeyError (notice that we have assigned our dictionary to a variable, a). While we have not introduced variables at this point, it is necessary to highlight some of the functions that are specific to dictionaries:

>>> type({'Key Lime Pie': 1, 'Blueberry Pie': 2})
<class 'dict'>
>>> dict((['key_1', 'value_1'],['key_2', 'value_2']))
{'key_1': 'value_1', 'key_2': 'value_2'}
>>> a = {'key1': 123, 'key2': 456}
>>> a['key1']
123

We can add or modify the value of a preexisting key in a dictionary by specifying a key and setting it equal to another object. We can remove objects using the pop() function, similar to the list pop() function, to remove an item in a dictionary by specifying its key instead of an index:

>>> a['key3'] = 789
>>> a
{'key1': 123, 'key2': 456, 'key3': 789}
>>> a.pop('key1')
123
>>> a
{'key2': 456, 'key3': 789}

The keys() and values() functions return a list of keys and values in the dictionary. We can use the items() function to return a list of tuples containing each key and value pair. These three functions are often used for conditionals and loops:

>>> a.keys()
dict_keys(['key2', 'key3'])
>>> a.values()
dict_values([456, 789])
>>> a.items()
dict_items([('key3', 789), ('key2', 456)])

Sets and tuples

Sets are similar to lists in that they contain a list of elements, though they must be unique items. With this, the elements must be immutable, meaning that the value must remain constant. For this, sets are best used on integers, strings, Boolean, floats, and tuples as elements. Sets do not index the elements, and therefore we cannot access the elements by their location in the set. Instead, we can access and remove elements through the use of the pop() method mentioned for the list method. Tuples are also similar to lists, though they are immutable. Built using parenthesis in lieu of brackets, elements do not have to be unique and of any data type:

>>> type(set([1, 4, 'asd', True]))
<class 'set'>
>>> g = set(["element1", "element2"])
>>> g
{'element1', 'element2'}
>>> g.pop()
'element2'
>>> g
{'element1'}
>>> tuple('foo')
('f', 'o' , 'o')
>>> ('b', 'a', 'r')
('b', 'a', 'r')
>>> ('Chapter1', 22)[0]
Chapter1
>>> ('Foo', 'Bar')[-1]
Bar

The important difference between a tuple and a list is that a tuple is immutable. This means that we cannot change a tuple object. Instead, we must replace the object completely or cast it to a list, which is mutable. This casting process is described in the next section. Replacing an object is very slow since the operation to add a value to a tuple is tuple = tuple + ('New value',), noting that the trailing comma is required to denote that this addition is a tuple.