Encoding a string using a Huffman tree
A Huffman tree allows efficient encoding of data by calculating a probability distribution of characters to optimize the space taken per character. Imagine compressing this book into one piece of paper and back without any information loss. Huffman trees allow this type of optimal lossless data compression based on statistics.
In this recipe, we will implement a Huffman tree from a source of text and produce a string representation of its Huffman codes.
For example, the string "hello world" contains 11 characters, which, depending on the encoding and architecture, may take up as few as 11 bytes of space to represent. The code in this recipe will transform the string into just 51 bits, or 6.375 bytes.
Getting ready
Make sure to be connected to the Internet since this recipe will download text from http://norgiv.com/big.txt to analyze probability distribution of many characters. We will be using the min-heap that was implemented in the previous recipe by...