Windows Malware Analysis Essentials

An annoying topic for many is negative numbers. Their representation in binary is a set of workaround techniques to represent negative numbers with the same data types and symbol set. How would you differentiate the values in that scenario? A binary pattern is by itself quite neutral to begin with. It is a representation of a sequence of symbols that have two possible values from the symbol set, which have a final resulting value based on a particular permutation pattern that denotes this value. In essence, the binary pattern could be a number, a picture, a text file, a video file, or so on. What a pattern constitutes is also dependent on who looks at it and how. Inherently, the pattern is quite ambiguous without a context to give it its definite meaning. Hence, in terms of compiled machine code, which we will dealing with, the way the instructions and their opcodes are chosen by the compiler build a context around the regular data type, for instance, a DWORD, which is 32 bits long, or a WORD, which is 16 bits long. This sort of structure prevents ambiguity for the translation mechanisms in place. You will learn ahead in assembly programming that the compiler will choose certain instructions based on its inferred data type. Thus, context is supported by design. JAE and JGE is some examples using analogous instructions, where the value for the first instruction mnemonic denotes the use of unsigned numbers, whereas the second instruction mnemonic denotes the use of signed numbers.

Signed data types will effectively halve the range of the unsigned data type version. This is because of the use of the sign bit as the Most Significant Bit (MSB). The binary values that will be represented will use 7 bits, which is 2^7 for a signed byte and 2^31 for a signed DWORD. The largest positive value will be (2^(n-1)-1). So, for a byte, the largest positive value will be 2^7 - 1 = 127. However, the largest negative value will be -128. In binary patterns, since each position is a power of 2, using one less bit toward the left (direction of increment) will result in using half of the value (shift left is multiplication by 2 and shift right is division by 2). Now, anytime, you see the familiar formula of (2^n - 1), you know that it is essentially the maximum value of that data type (n bits), which is the last value in the permutation chain. 2^n will represent the total number of values that you can use including zero. You will see this formula in lots of computing areas, including the area of finding large prime numbers, which is an active area of research.

The main methods used are sign and magnitude, where the MSB is set to denote negative numbers and 1's complement and 2's complement where the complement is taken by inverting the value (1's complement, NOT x86 instruction) and adding 1 to the result to get the 2's complement (NEG x86 instruction). Is 0xFFFFFFFF = ((2^32)-1) or is it -1? You can check your debugger (in-depth introduction later) to see whether the data type is unsigned (positive) or the type is signed (negative and positive). Note from the table below that zero has some redundancy as is represented by multiple symbols in a couple of methods.

For our purposes and keeping in mind the C data types, the data type char equals 1 byte, short equals 2 bytes, long equals 4 bytes, double equals 8 bytes, sbyte is still a byte (8 bits) with the data range effectively halved, and the MSB now represents the minus sign; int equals 4 bytes, word equals 2 bytes, dword equals 4 bytes, and qword equals 8 bytes.

For the C types, you can write a simple program with the lines:

#include <stdio.h>
int main() {
printf("%d",sizeof(double));
return 0;
}

Insert your data type of choice inside the sizeof() operator.

Binary addition and subtraction of unsigned numbers is another curious segment. When you add 1 + 1 in decimal, you have the symbol 2 to denote two entities or objects or values, so you can write 2 to the result. However, in binary, the symbol set is similar to decimals only for the 2 values {0, 1}. Hence, to represent larger quantities, you displace the same symbols toward the left to symbolize that quantity. Binary does not use the decimal range, so 2 in binary will be 10, which is not the decimal 10. Is 1 + 1 + 1 = 3? That would be wrong in binary terms because there is no symbol for 3 in binary even if the quantity 3 can be represented validly. So, the resulting value will be the binary symbol sequence 11 and not decimal 11.

Signed numbers have to deal with carry in and carry out comparisons of the MSB position to check for overflow conditions. If the carry in value is the same as the carry out value, there is no overflow; however, if there is a discrepancy, there is an overflow. This is essential for the proper representation of signed data types and addition and subtraction between these types. This is a simple XOR (please read more on gates in the sections later on in this chapter) such as comparison for internal circuitry, which is a much more streamlined affair than the other error-checking solutions. There is an area in the microprocessor to check for conditions such as this during calculations: the EFLAGS register and the OF or Overflow Flag bit, which is set whenever there is an overflow.

A signed data type overflow conditions table

Let us delve into signed data types and overflow conditions, which can be perused succinctly in the following table:

If there is a carry out at the MSB with no carry in, then there is an overflow. If there is a carry in to the MSB with no carry out, there is an overflow.

For instance:

(-7) +(-7) = -14
     11111001
     11111001=
(1)1111 0010

The carry that was getting into the MSB was (1 + 1 + 1 = 11, so 1 as carry).

The carry out is 1 as well, which will be discarded. However, they are both the same so there has been no overflow and the result is valid as negative 14. You can check it as NOT (bitwise inversion) (11110010) = 13 (0000 1101), and add 1 to get 14. It's the 2's complement of 14. Since the MSB is set, the number is a signed data type negative number, which adheres to the representation requirements.

Take another example:

    1100 0000   (192)
    1011 0001   (177)    (+) =
(1)0111 0001

This evaluates to 369, which is larger than the data type range of a byte, which is 256. Hence, we can assume that taking the numbers as unsigned is an error.

However, if we take the type as the signed type:

The binary pattern is a 2's complement of 64 decimals as [NOT (1100 0000) +1] = 64
The second number is also taken as a 2's complement of 79 [NOT(1011 0001) + 1] = 79
Taken as signed numbers, we get the correct value as (-64) + (-79) = 113, a positive signed number
As a signed type, the byte will have 127 as the largest positive number and -128 as the largest negative number

Remember that a rollover effect happens if the largest number on either side is reached during the increment. To reach 127 as the largest permissible value in a byte, 63 units are required to be added. After that, from -128 onward, the range is traversed backward toward 0 at the center of the signed range number line. From 79, subtract 63 to get 16 units of increments remaining. Go back that many steps from -128 to reach -113. This is the correct answer within the range.

This same process occurs for larger signed data types as well as for byte lengths such as WORD, DWORD, and QWORD.

A better way to understand negative representation is the simple mathematical result of adding a positive and a negative number of the same magnitude. 5 + (-5) = 0. So, you can ask the question: what number when added to a positive number gives 0? This will be key to understanding the negative number representation and its myriad forms of optimized notation systems, and their pros and cons.

Say, we take the decimal 5 and convert it to its binary notation, 0101.

The 1 that is carried at the end is discarded as the requisite value is already obtained and is an overflow for the current data type that can be taken as a disposable artifact for our purposes.

So, we get 1011 as negative 5 as a result. As a positive number, the value is 11. However that is only for the unsigned data type. For signed types, the type data ranges are bifurcated into two parts: positive and negative bit patterns. Note another feature of this result. If you remove 1 from the LSB, you essentially get the 1's complement of the original value. 5 = 0101 and the (result - 1) = 1010. Does that look like an inversion? Yes, it does. Now, the final result itself is the 1's complement plus 1. If you look at the bit patterns, you essentially are doing a NOT operation and a (NOT + 1) operation. x86 microprocessors provide instructions that can work at a bitwise level with NOT and NEG instructions. So now, negative values can be computed and represented logically instead of numerically for every number that falls within the range of a data type. However, 2's complement is the best method currently as 1 does not have to be added and subtraction is simpler, as well as not dealing with positive and negative zeroes. This saves CPU time and additional circuitry design specifically for negative numbers, so the benefit of using the same addition circuitry (ALU) for both addition and subtraction (negation and addition) is very evident.

We will delve more into other number representation schemes (real numbers/fixed and floating point numbers), BCD, assembly programming, deciphering disassembly, and arithmetic in the coming chapters.

Windows Malware Analysis Essentials

By : Victor Marak

Windows Malware Analysis Essentials

By: Victor Marak

Overview of this book

Related Content you might be interested in

Current Title:

Windows Malware Analysis Essentials

Signed numbers and complements

A signed data type overflow conditions table