Book Image

Windows Malware Analysis Essentials

By : Victor Marak
Book Image

Windows Malware Analysis Essentials

By: Victor Marak

Overview of this book

Table of Contents (13 chapters)

Performing binary reconnaissance


The PE format is the executable binary format in Windows. The overall structure of a PE file is shown in the exhibit; the PE file has a bunch of headers, which are metadata for the Windows loader to help load the image to process memory. The MZ or DOS header starts with the MZ or 0x4D 0x5A magic number. The 4-byte value at offset 0x3C from the offset 0x0 of the MZ header gives the location of the start of the PE header, which has the signature 'PE\0\0' or 0x50 0x45 0x0 0x0. The PE header contains the optional header, which is a legacy term and is certainly not optional. Thereafter, the section header begins, which contains the metadata describing the sections and their properties—section name, raw and virtual size, and address and section characteristics. Thereafter, the sections themselves are linearly appended, one after the other.

The following is an excerpt from the COFF specifications:

The header data structures are as follows: Several underground articles also exist that do an admirable job of documenting the PE format; Goppit's PE tutorial, B. Luevelsmeyer's tutorial, and Kris Kaspersky's Hacker Debugging and Hacker Disassembling are also good references. Those who are interested in the golden age of Windows reverse engineering must search for + ORC's legacy on the Internet.

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Scanning malware on the web

If you can afford to submit a malware sample (say, make it public) or a hash of the sample to online scanners, you should be able to get a detection report and submit the sample for posterity, particularly if it is undetected and actually malware. Virustotal.com and Jotti.com as online scanners, and Anubis, Minibis, Cuckoo, BitBlaze, Comodo, ThreatExpert, Zero Wine, and BSA-Buster Sandbox Analyzer (potential abandonware though works for user mode malware that can run on XP and Win 7) are dependable among the myriad of scanners that fulfill this role. Installed antivirus products on your main host system or virtual machines can also be availed of to get the preliminary screening done; they have to be updated prior to scanning to ensure that they do not miss available signatures. Most have On-Demand Scanning, which is manually invoking the scanning process on a user-selected file, usually through shell integration (also known as right-click context menu item) in Windows. This is different from On-Access Scanning, which intercepts I/O calls and network activity via kernel filesystem and network filter drivers in all running OS processes, as well as removable media and network downloads to the filesystem, and proceeds with scanning them for malicious code, letting them continue if they are benign or else proceeding with termination, quarantine, or deletion of any malware present. Much of the toolsets that we will implement will focus on the PE/Coff format. Streamlining your toolkit must be an ongoing pursuit in addition to creating custom scripts and tools as and when needed; gear lust is not as important as knowing how to use the ones that you have inside out. Even if all the tools are taken away from an analyst, if he/she has a debugger, disassembler, hex editor, and development IDE (arguably, the only tool required as others can be made from this given enough time and motivation (and brains)), he or she can still fulfil their role. Before you examine the PE format, let us look at the battery of tools that we can incorporate in our daily analysis sessions for binary reconnaissance. Make sure that you have the most recent and updated versions of the following tools as this software can and will contain vulnerabilities that can be exploited, particularly given the context of malware analysis. The PE format is very well documented in MSDN: https://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx.

Further, a good graphic of the PE format is available at https://raw.githubusercontent.com/corkami/pics/master/PE101.png.

Getting a great view with PEView

PEView exudes simplicity and a good GUI design as compared to other click-and-browse tools while being a very robust format parser for every executable that you can throw at it. The tree control to the left categorically breaks up a PE binary into its constituents. The right view displays the hex dump or the section and header attributes. In the figure, the .text code section is selected with the hexdump view towards the right pane:

Know the ins and outs with PEInsider

PEInsider from Cerbero is a more recent alternative to the Explorer Suite (CFF Explorer), its earlier offering. This is one of the more thoughtful designs with highly informative context-based displays. The overall features are similar to those of PEView.

Identifying with PEiD

PEiD is a now defunct binary utility that is still immensely useful in detecting packers, compressors, and compiler/linkers, and has a slew of PE format-related features, including a concise GUI display of the most pertinent attributes of the PE executable, information gathered from the PE header/Optional Header/Data Directories/Section Headers, a task viewer to enumerate and manipulate running processes, including taking memory dumps and a dependency list, shell integration (right-click on Explorer.exe), and entropy calculation. A mini hex viewer and basic disassembler with control flow navigation (jmp/call) is directly linked with the PE section(s) view context menu along with section header attributes, which enables inbuilt navigation to the binary file for investigation. Signatures for packers and similar utilities are compiled in a text file called userdb.txt. You can add your own custom signatures to the database.

The two main scanning modes are normal and deep. Normal mode scans the original entry point (OEP) of the executable for a match in the database signatures. Deep mode scans the section containing the OEP for a more accurate match. There is a hardcore mode that scans the entire file and thus, is used as a last resort. Finding the OEP can be a challenge many a times, if the packer is unknown.

Some useful plugins are included such as KANAL plugin for crypto analysis and the Generic OEP finder.

In the following image, the packer/compiler display label shows PECompact 2.x -> Jeremy Collake, which is the detection of the packer used in the input sample, PEiD itself in this case.

Obfuscated files and packed/compressed files are somewhat different implementations of the process of concealing something in the obfuscated binary that aims to confuse the human analyst as well as automatic code analyzers, while retaining the functionality of the original binary. The various obfuscation methods include source code- or binary code-specific manipulations, such as string mangling and the removal of source code variables and function identifiers to something seemingly random and redundant, thus increasing the complexity of the control flow analysis. Packing and compression focuses on code compaction and the reduction of the total binary file image footprint. This procedure increases the entropy and hence the obfuscating binary, to a certain extent. The PE file header values and sections are non-standard (manipulation of linker-generated headers and sections) and packer-specific; they rebuild the compressed import and export tables in the memory, which is usually done by an unpacking stub.

Some code armoring is also done using virtualization, and an intermediate representation of the original assembly instructions that are mapped at runtime and JIT compiled by the execution engine. Care must be taken not to invoke the OEP finder plugins and the other unpacking plugins for PEiD, as doing so will execute the malware on the system and can possibly infect it during the analysis.

Walking on frozen terrain with DeepFreeze

DeepFreeze from Faronics is a great utility that has been around for quite some time now. It is excellent for use with the main standalone host system, and particularly, with a virtualization setup using VMware or VirtualBox to preserve system integrity. The baseline state is saved when you activate it from the notification bar in Windows, and post installation, execution (deliberate or accidental), and analysis of malware and the related packet captures, dropped files, and memory dumps, you can simply revert to your original baseline as many times as you like. Uninstalling can be a bit tricky for some installations (BIOS settings have to be manipulated), and be sure to remember the login password because even uninstalling from the Control Panel will not proceed if you lose the password and you could be stuck with a frozen machine that persists nothing else.

Meeting the rex of HexEditors

Hex editors are an essential utility to work with binary executables as they enable an as-is unbiased view of the file format that is as is and not tainted by any kind of parsing logic of a format viewer. This enables you to work at the byte level and build entirely new binary executables from scratch. Utilities are available that allow ease of editing on PE files built by motivated individuals over the course of research and coding cycles, which makes this look easy; however, when required, a well-featured hex editor gives unfettered access to working manually with import and export tables, PE headers, sections, overlays, wire captures, and memory dumps, among other assets. It is simply a hexadecimal view of a binary file, expressed as a binary string. Hex editors vary in the gamut of features provided out of the box; some of the more important ones include multiple file views, data structure templates, data type viewing, entropy calculation, hashing tools, strings search, offset navigation, bookmarks, and color mapping, file statistics, Boolean operations, encryption/decryption, support for large files/alternate data streams/filesystems, live acquisition, forensics utilities, and even a disassembler.

Hex workshop is a very well-featured hex editor with many of the features discussed. 010 Editor has a great interface with flexible templates for data structures. Flex Hex supports large files and alternate data streams. WinHex is more of a forensics-focused hex editor with RAM editing and MBR/hard disk-focused features.

Digesting string theory with strings

Sysinternals Suite has many useful utilities, and the strings.exe command-line utility does this one task very well. It takes the target file or folder as an input.

The –a and –u switches can be used to display only ASCII or UNICODE as both are displayed by default. The –o switch appends the offset of each detected string in the file. –s can be used to recursively traverse a parent folder. The string length of 3 is set by default and can be changed using the –n <length> switch.

In the following figure, we can see the familiar fileoffset:string combination display of the MZ header and the standard section names .text, .rdata, .data, and .rsrc with the rest resembling junk. Version strings, API names, URIs, URLs, IP addresses, password lists, FTP login strings, HTTP GET requests, mutex names, HTML pages, IFrame tags, and embedded scripts are among some of the categorical strings that you can look out for during malware analysis.

Bintext from FoundStone Inc. is a GUI strings tool and is another solid complement to the Sysinternal strings utility.

For runtime strings analysis and image verification (Image tab | Verify button) of digital signing by certification root authority, you can use Process Explorer from Sysinternals. Although an execution monitoring tool, this tool has numerous features that can be utilized for malware analysis. Double-click the process name, navigate to the Strings tab, and choose the Memory radio button to see the strings in the process memory.

If the malware is packed or compressed, the Image strings and Memory strings will be significantly different as the memory-unpacked regions have their packed strings unfurled.

The ASCII chart is given next for reference. You would read it like a grid value from the row and append the column value to obtain the resulting decimal value. The ASCII character '1' is read from row 4, and the column value is 9. You concatenate 4 and 9 to get 49, which is also 0x31.

ASCII values are 7-bit encoded numeric codes assigned to symbols, which consist of natural numbers, upper case and lower case letters of the English alphabet, punctuation symbols, carriage return, line feed, null, backspace, and other special characters. Because 7 bits are used, there are 2^7 or 128 characters that can be used. ASCII provides for the first 128 characters of Unicode. Unicode uses 16 bits, and hence, provides for more than 65,000 numeric codes, out of which about 40,000 are already used, which leaves a lot of codes for future use.

Hashish, pot, and stashing with hashing tools

Hashing utilities are useful for identifying and watermarking files and malware samples for further processing and for posterity. The MD5 and SHA-1/2 variants are the most commonly used algorithms. FileAlyzer 2 has a comprehensive hashing feature (the Hashes tab) and an interesting set of features for preliminary binary and text file (scripts and html) investigation, including network support for online malware scanners such as https://www.virustotal.com/. Shell integration means that any file can be analyzed from the Windows Explorer context menu, which makes it the first tool that you may ideally use for initial triage. It packs quite a bit, including a disassembler, a hex viewer, packer/compiler detection, support for archive files, ssdeep hashing using imported DLLs (ssdeep.dll), and Clam AV scanning (libclamav.dll).

Nirsoft's (Nir Sofer) HashMyFiles utility is an excellent Windows-based GUI hashing tool. It takes a file or a folder as input and lists out in columns hashes for MD5, SHA-1, CRC32, SHA-256, SHA-512, and SHA-384. It also displays the created and modified times, the full path of the file, file size, version strings, file extensions, identical files, file attributes, and https://www.virustotal.com/ submission.

Hashing tools can be used to generate malware database hash lists, as well as for checking the integrity of the existing binaries. Hashing also plays an important role in antivirus signature creation. During and after analyses, you would be ideally using a hex editor to create checksums and hashes of byte regions with parameters such as the number of bytes to hash, as well as the start offset and with additional parameters from the OEP or some offset of it or backwards into a file or from the last byte of a file and hashes of overlays, among other options. Once you have the required hashes, you would be customizing and compiling them for each binary by using vendor-specific VDL (Virus Definition Language) or Signature SDK, which exposes a set of APIs that the antimalware engine utilizes for detection during scanning. Quite a few of the vendors have internal and networked point-and-click interface systems for the analyst exposing features of the sample(s) and generating hashes and processing them directly from the sample queues without even having to download the sample binaries for a local system analysis, unless mandated. The relatively simple 1:1 static signatures look something like the following:

Trojan "Win32/Agent.AB" // malware is a trojan, nomenclature of Agent and a variant name of AB
{
  entrypoint(0x15B8);   //entrypoint in the raw file image HashEntrypoint(0x0,0x4B0,0x127C099B); //hash of OEP dropout till 0x4B0 of malicious data
  HashFileStart(0x3E15,0x1BE0,0x00E44360); //data island in code section + related code
}

A moving window and wildcard-based signature looks like the following:

Name:Dropper.Malware2.AA
{

$1 = 4A 4F 38 34 30 31 50 52 49 4E 43 50 48 41 53 54 41 54 5C 54 65 6D 70 5C

$2 =50 FF 90 58 02 00 00 59 33 C0 C3 55 8B EC 81 EC 0C [4-6] 8B 75 08 57 8D BE F8 04 00 00 57 33 DB 53 6A 04 FF 96 34 03 ?? ?? 85 C0 0F 85 B4 00 00 00

$3=HashResourceIcon(0x43547687);

}

Here, the pattern is in hex bytes; the square brackets denote that for the length of the next 89 bytes, if you find the value 30h, then you can continue with the rest of the signature. The ?? wildcard means that any byte value can be present after EBh and before 40h. Since the OEP is not specified, the OEP is searched first and then the beginning of the code section and then the top and tail of the file, including all sections and overlays (except the code section that is not scanned again). The resource malware-specific icon asset is checked for a checksum match.

Polymorphic malware is checked for the decryption stub and the decrypted malware code and other file properties to enable robust detection without using brute force on the keyspace or doing exactly that for oligomorphic malware that has a fixed or a feasibly finite number of decryption keys.

These basic signatures are then recompiled to a performance-efficient custom binary format before they are fed to the antimalware signature database. They are also made modular so that live updates are possible. The other variant is generic signatures wherein heuristics and data mining algorithms are implemented to capture the essence of a malware family or generation or fingerprint a new one if it is a variant. This requires creating a failsafe set of conditions in the format that is specified by the generic detection engine, usually as a detection script that returns a positive detection if all or most of the conditions are met. This is usually a more involved effort and requires judicious testing for false positives and false negatives till the point of diminishing returns. API sequence profiling and instruction opcode statistical analysis are some of the methods that can be used to provide inputs for generic signatures.

To get an idea of how antivirus products can be analyzed, have a look at the following links:

Here is a paper on fuzzy hash: http://jessekornblum.com/presentations/cdfsl07.pdf.

Getting resourceful with XNResource Editor

XNResourceEditor is a well-featured and easy-to-use resource editor utility for executable files. The resources are a set of binary assets that are compiled using a resource compiler to the format expected by the PE specification, which the linker finally integrates into the resulting executable. Usually the .rsrc section in the executable contains the compiled resources. The Bitmap section displays bitmaps that are used by the executable; the Dialog section displays dialog items implemented in the executable, which could be the main interface template; the Version strings contain properties such as ProductVersion, ProductName, FileVersion, FileDescription, LegalCopyright, LegalTrademarks, and CompanyName, which could be used to add detection logic post analysis; String Table contains null-terminated Unicode strings (Windows is fully Unicode, although ASCII text can also be present in the binary); Cursor Group contains cursor files; and Icon Group contains icon files.

Many malware binaries contain junk text- or malware-specific identifiers in the Version section and the String Table section. Resource sections can contain malicious code as they can be any binary asset, and hence, are important for the purposes of investigation. Further, even if an executable is not executing or is corrupted in the code, possibly post re-infection or an error in the unpacking algorithm, the OEP might be patched badly or might require a condition for redirection among a host of other reasons. If the resource section is relatively unaffected, you can still investigate the binary prima facie with a resource editor as the dialogs and strings, as well as the interface layout and icons, can reveal a lot about the binary in question. You can then use the Internet to gather more information about the resource assets. The assets themselves can be examined for validity and embedded shellcode or malware. Anti-malware signatures for fake antivirus and spyware (where the resource icons, strings, bitmaps, version strings, and for that matter, any confirmed malicious asset in the resource section) can be included in composite malware signatures and in the detection logic post the extraction of resources.

Too much leech with Dependency Walker

Depends.exe, or Dependency Walker, is a very thorough tool for providing detailed listings of all dependencies that are statically linked and dynamically called via imports or delay-load imports.

In the following image, the sample bcb6kg.EXE file is set as input (drag and drop). We see the list of imported DLLs on the left-most pane tree control. If you select any entry in the list, such as Kernel32.dll, the adjacent top-right pane contains the list of functions imported from Kernel32.dll. The pane just below it will display the complete set of functions exported by Kernel32.dll, which depending on the context, might not be too useful as of now. The pane above the bottom pane displays the binary information of the modules that will be loaded by the sample executable. The bottom pane displays error messages and the log output of the activities.

The runtime profiling feature F7 ensures that dependencies that cannot be resolved via a binary static analysis will be integrated into the report. Of course, certain libraries might not be invoked without external input or user intervention, and in such a case, it will not be detected, but in most cases, it will do a reasonably good job:

Getting dumped by Dumpbin

This is the Swiss army knife for everything PE from Microsoft. It comes with the MASM32 SDK as well as most of the developer SDKs by Microsoft, including Visual Studio installations (via Visual Studio Command Prompt).

Simply type DUMPBIN /ALL <filename> to dump everything about a PE file. You could append it to a text file for ease of recall as the console buffer will run out.

For simple automation, you could type the following in cmd.exe on a folder of binary samples for batch processing (replace the %i variable with %%i for a .bat batch file).

FOR %i in (*.exe *.dll) do dumpbin /imports %i >> imports.txt

Dumpbin.exe has to be in the current folder or configured in the environment variables. The preceding command enumerates all the .exe and .dll files, invokes dumpbin with the /imports switch, and appends the output to imports.txt in the current folder. You can replace the switches accordingly. The following screenshot shows how the imports are reported in Dumpbin with the virtual addresses (with respect to the image base of the executable) and function name hint values as well as the function name strings in their own columns. In case the function names are not present, the function ordinals are used instead, which are just numbers (from #1 and not 0).

The /DISASM option produces an acceptable disassembly of the code with the :bytes (default) or :nobytes option to display the hex opcodes or just the assembly mnemonic listings; you need to type the following line of code to display just the assembly listing:

dumpbin /diasm:nobytes<filename>

With an array of essential tools at your disposal, you may think that it would be redundant to have tools that can implement possibly much of the available toolset. In the good old days, reverse engineering started with plain Jane developmental tools such as basic debuggers, printouts, and paper and pencil. Notwithstanding the culture of homegrown tools by the underground elite, as the industry developed over the years, specialized tools (free and commercial) started being developed as a result of R&D. We will discuss two such tools for our purposes of a static analysis of a binary executable and incorporate disassembly analyses in the equation:

  • PE Explorer: This is a lightweight doppelganger of IDA Pro with a lower price tag but having a similar feature set and possibly a more integrated feature set, while not as extensive as IDA Pro.

    A relative new comer with good looking prospects is available at https://www.relyze.com, which provides much of the features you would expect from IDA Pro albeit in a more streamlined interface.

    You can also check out Hopper disassembler if you are reverse engineering on MacOSX which also does PE files and has a very unique well designed feel to it. Visit http://hopperapp.com for more.

  • IDA Pro: The industry standard for binary reverse engineering. We will cover scenarios and the multitude ways of analyzing native binaries by using it.