Book Image

Hands-On Cryptography with Python

By : Samuel Bowne
Book Image

Hands-On Cryptography with Python

By: Samuel Bowne

Overview of this book

Cryptography is essential for protecting sensitive information, but it is often performed inadequately or incorrectly. Hands-On Cryptography with Python starts by showing you how to encrypt and evaluate your data. The book will then walk you through various data encryption methods,such as obfuscation, hashing, and strong encryption, and will show how you can attack cryptographic systems. You will learn how to create hashes, crack them, and will understand why they are so different from each other. In the concluding chapters, you will use three NIST-recommended systems: the Advanced Encryption Standard (AES), the Secure Hash Algorithm (SHA), and the Rivest-Shamir-Adleman (RSA). By the end of this book, you will be able to deal with common errors in encryption.
Table of Contents (9 chapters)

base64 encoding


We will now discuss encoding ASCII data as bytes and base64 encoding these bytes. We will also cover base64 encoding for binary data and decoding to get back to the original input.

ASCII data

In ASCII, each character turns into one byte:

  • A is 65 in base 10, and in binary, it is 0b01000001. Here, you have 0 in the most significant bit because there's no 128, then you have 1 in the next bit for 64 and 1 in the end, so you have 64 + 1=65.
  • The next is B with base 66 and C with base 67. The binary for B is 0b01000010, and for C, it is 0b01000011.

The three-letter string ABC can be interpreted as a 24-bit string that looks like this:

We've added these blue lines just to show where the bytes are broken out. To interpret that as base64, you need to break it into groups of 6 bits. 6 bits have a total of 64 combinations, so you need 64 characters to encode it.

The characters used are as follows:

We use the capital letters for the first 26, lowercase letters for another 26, the digits for another 10, which gets you up to 62 characters. In the most common form of base64, you use + and / for the last two characters:

If you have an ASCII string of three characters, it turns into 24 bits interpreted as 3 groups of 8. If you just break them up into 4 groups of 6, you have 4 numbers between 0 and 63, and in this case, they turn into QUJ, and D. In Python, you just have a string followed by the command:

>>> "ABC".encode("base64")
'QUJD\n'

This will do the encoding. Then add an extra carriage return at the end, which neither matters nor affects the decoding.

What if you have something other than a group of 3 bytes?

Note

The = sign is used to indicate padding if the input string length is not a multiple of 3 bytes.

If you have four bytes for the input, then the base64 encoding ends with two equals signs, just to indicate that it had to add two characters of padding. If you have five bytes, you have one equals sign, and if you have six bytes, then there's no equals signs, indicating that the input fit neatly into base64 with no need for padding. The padding is null.

You take ABCD and encode it and then you take ABCD with explicit byte of zero. x00 means a single character with eight bits of zero, and you get the same result with just an extra A and one equals, and if you fill it out all the way with two bytes of zero, you get capital A all the way. Remember: a capital A is the very first character in base64. It stands for six bits of zero.

Let's take a look at base64 encoding in Python:

  1. We will start python up and make a string. If you just make a string with quotes and press Enter, it will print it in immediate mode:
>>> "ABC"
'ABC'
  1. Python will print the result of each calculation automatically. If we encode that with base64, we will get this:
>>> "ABC".encode(""base64")
'QUJD\n'
  1. It turns into QUJD with an extra courage return at the end and if we make it longer:
>>> "ABCD".encode("base64")
'QUJDRA==\n'
  1. This has two equals signs because we started with four bytes, and it had to add two more to make it a multiple of three:
>>> "ABCDE".encode("base64")
'QUJDREU=\n'
>>> "ABCDEF".encode("base64")
'QUJDREVG\n'
  1. With a five-byte input, we have one equals sign; and with six bytes of input, we have no more equal signs, instead, we have a total of eight characters with base64.
  2. Let's go back to ABCD with the two equals signs:
>>>"ABCD".encode("base64")
'QUJDRA==\n'
  1. You can see how the padding was done by putting it in explicitly here:
>>> "ABCD\x00\x00".encode("base64")
'QUJDRAA=\n'

There's a first byte of zero, and now we get another single equals sign.

  1. Let's put in a second byte of zero:
>>> "ABCD\x00\x00".encode("base64")
'QUJDRAAA\n'

We have no padding here, and we see that the last characters are all A, indicating that there's been a filling of binary zeros.

Binary data

The next issue is handling binary data. Executable files are binary and not ASCII. Also, images, movies, and many other files have binary data. ASCII data always starts with a zero as the first bit, but base64 works fine with binary data. Here is a common executable file, a forensic utility; it starts with MZê and has unprintable ASCII characters:

As this is a hex viewer, you see the raw data in hexadecimal, and on the right, it attempts to print it as ASCII. Windows programs have this string at the start, and this program cannot be run in DOS mode, but they have a lot of unprintable characters, such as FF and 0, which really doesn't matter for Python at all. An easy way to encode data like that is to read it directly from the file. You can use thewithcommand. It will just open a file with filename and mode read binary with the handlefand then you can read it. Thewithcommand is here just to tell Python to open the file, and that if it cannot be opened due to some error, then just to close the handle and then decode it exactly the same way. To decode data you've encoded in this fashion, you just take the output string and you put .decode instead of .encode.

Now let's take a look at how to handle binary data:

  1. We will first exit Python so that we can see the filesystem, and then we'll look for the Ac file using the command shown here:
>>> exit()
$ ls Ac*
AccessData Registry Viewer_1.8.3.exe

There's the filename. Since that's kind of a long block, we are just going to copy and paste it.

  1. Now we start Python and clear the screen using the following command:
$ clear
  1. We will start python again:
$ python
  1. Alright, so, now we use the following command:
>>> with open("AccessData Registry Viewer_1.8.3.exe", "rb") as f:
... data = f.read()
... print data.encode("base64")

Here we enter the filename first and then the mode, which is read binary. We will give it filename handle of f. We will take all the data and put it in a single variable data. We could just encode the data inbase64, and it would automatically print it. If you have an intended block in Python, you have to pressEntertwice so it knows the block is done, and thenbase64 encodes it.

  1. You get a long block of base64 that is not very readable, but this is a handy way to handle data like that; say, if you want to email it or put it in some other text format. So, to do the decoding, let's encode something simpler so that we can easily see the result:
>>> "ABC".encode("base64")
'QUJD\n'
  1. If we want to play with it, put that in a cvariable using the following command:
>>> c = "ABC".encode("base64")
>>> print c
QUJD
  1. Now we can print c to make sure that we have got what we expected. We have QUJD, which is what we expected. So, now we can decode it using the following command:
>>> c.decode("base64")
'ABC'

base64 is not encrypting. It is not hiding anything, but it is just another way to represent it. In the next section, we'll cover XOR.