Huffman Encoding

In its most general sense, computer data consists of a sequence of bytes. Huffman encoding is a lossless compression technique which exploits the fact that, as a general observation, not all byte values occur with equal frequency in the data. Given that is the case, considerable savings can be realized by representing byte values using a variable number of bits - short bit runs for the more frequently occurring values and longer ones for the less frequent ones. We refer to these variable length bit runs as Huffman Codes. The technique used for generating these codes is as follows

This is best understood by looking at an example. Consider the string inflate deflate. The Huffman tree for this string is shown below

Huffman tree

The frequencies and the Huffman codes for each leaf node in this tree are tabulated below

CharacterFrequencyHuffman Code
e300
a2010
f2011
l2100
t2101
(space)11100
d11101
i11110
n11111

Using these codes the original string can be represented by the binary pattern

1110111101110001010100110011010001110001010100

The 47 bits required to store this pattern represents a saving of 60%. While this may sound very good it should be noted that the technique does not always deliver a saving. Without the Huffman tree all we have in the compressed data is an amorphous mass of binary digits. In order to retrieve the original data we need to store some information regarding the tree from which the compressed data were constructed. This imposes an overhead that makes the method unsuitable for very short sequences of data. Click on the link below for a Delphi implementation of the algorithm.

Like the table design? You can find the CSS for it and many other table styles here.

Download
Jump To...

Colophon