Assembler – transforming assembly code into machine code

At this moment, you may notice that something is not quite right. The processor chips we use every day are not capable of executing text-based assembly code but are instead parsed into the machine code of the corresponding instruction set to perform the corresponding memory operations. Thus, during the compiling process, the previously mentioned assembly code is converted by the assembler into the machine code that can be understood by the chip.

Figure 1.2 shows the dynamic memory distribution of the 32-bit PE:

Figure 1.2 – 32-bit PE memory layout

Since the chip cannot directly parse strings such as hi there or info, data (such as global variables, static strings, global arrays, etc.) is first stored in a separate structure called a section. Each section is created with an offset address where it is expected to be placed. If the code later needs to extract resources identified during these compilation periods, the corresponding data can be obtained from the corresponding offset addresses. Here is an example:

The aforementioned info string can be expressed as \x69\x6E\x66\x6F\x00 in ASCII code (5 bytes in total with null at the end of the string). The binary data of this string can be stored at the beginning of the .rdata section. Similarly, the hi there string can be stored closely after the previous string at the address of the .rdata section at offset +5.
In fact, the aforementioned call MessageBoxA API is not understood by the chip. Therefore, the compiler will generate an Import Address Table struct, which is the .idata section, to hold the address of the system function that the current program wants to call. When needed by the program, the corresponding function address can be extracted from this table, enabling the thread to jump to the function address and continue executing the system function.
Generally speaking, it is the compiler’s practice to store the code content in the .text section.
Each individual running process does not have just one PE module. Either *.EXE or *.DLL mounted in the process memory is packaged in PE format.
In practice, each module loaded into the memory must be assigned an image base address to hold all contents of the module. In the case of a 32-bit *.EXE, the image base address would normally be 0x400000.
The absolute address of each piece of data in the dynamic memory will be the image base address of this module + the section offset + the offset of the data on the section. Take the 0x400000 image base address as an example. If we want to get the info string, the expected content will be placed at 0x402000 (0x400000 + 0x2000 + 0x00). Similarly, hi there would be at 0x402005, and the MessageBoxA pointer would be stored at 0x403018.

Important note

There is no guarantee that the compiler will generate .text, .rdata, and .idata sections in practice, or that their respective uses will be for these functions. Most compilers follow the previously mentioned principles to allocate memory. Visual Studio compilers, for example, do not produce executable programs with .idata sections to hold function pointer tables, but rather, in readable and writable .rdata sections.

What is here is only a rough understanding of the properties of block and absolute addressing in the dynamic memory; it is not necessary to be obsessed with understanding the content, attributes, and how to fill them correctly in practice. The following chapters will explain the meaning of each structure in detail and how to design it by yourself.

In this section, we learned about the transformation to machine code operations during program execution, as well as the various sections and offsets of data stored in memory that can be accessed later in the compiling process.

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Windows APT Warfare

By : Sheng-Hao Ma

Windows APT Warfare

By: Sheng-Hao Ma

Overview of this book

Assembler – transforming assembly code into machine code

Windows APT Warfare

By : Sheng-Hao Ma

Windows APT Warfare

By: Sheng-Hao Ma

Overview of this book

Assembler – transforming assembly code into machine code

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access