In modern computer forensics, we are tasked with examining massive datasets for evidence that supports or refutes an event. It is quite common to see a case that involves multiple devices or large amounts of data. With the sheer volume of data to evaluate, an examiner must sift out the information that is not relevant to the case and identify the data that is of interest. This process of identification takes a fair amount of time, even with current tools. In this chapter, we are going to explore Python solutions that can help us identify known files in a folder, or a mounted evidence container, in an automated manner.
Commonly, a white or black list can help us identify known files on a system through a matching hash value. If the hash value is a match, we can identify files as normal, malicious, or otherwise notable. But what if a file is not an exact match? This is an issue with the traditional cryptographic hashes we use in forensics to generate a unique hash based...