# More on Huffman encoding

As well as the basic form of Huffman coding, Huffman’s algorithm can also be used to encode streams to deliver compression as well as coding (at a cost of codes being not immediately decipherable).

Once again, with thanks to Information and Coding Theory, here’s a brief explanation.

With Huffman coding we can directly code a binary symbol stream $S = \{s_1, s_2\}$ where $s_1 = 0, s_2 = 1$.

But with encoding we can produce a code for $S^n$ where $n$ is arbitrary, e.g. $S^3 = \{s_1s_1s_1, s_1s_1s_2, s_1s_2s_1, s_1s_2s_2, s_2s_1s_1, s_2s_1s_2, s_2s_2s_1, s_2s_2s_2 \}$

So let us assume (e.g. we are transmitting a very dark image via our fax machine) that the probability of $s_1 = \frac{2}{3}$ and for $s_2 = \frac{1}{3}$

Then we have the following probability distributions:

$S$:  $\frac{8}{27} , \frac{4}{27} , \frac{4}{27} , \frac{2}{27} , \frac{4}{27} , \frac{2}{27} , \frac{2}{27} , \frac{1}{27}$

$S^{\prime}$: $\frac{8}{27},$ $\frac{4}{27} ,$ $\frac{2}{27} , \frac{4}{27} , \frac{2}{27} , \frac{2}{27} , \frac{3}{27}$

$S^{\prime\prime}$: $\frac{8}{27},$ $\frac{4}{27},$ $\frac{2}{27},\frac{4}{27},\frac{3}{27},\frac{4}{27}$

$S^{\prime\prime\prime}$: $\frac{8}{27},\frac{4}{27},\frac{4}{27},\frac{4}{27},\frac{5}{27}$

$S^{\prime\prime\prime\prime}$: $\frac{8}{27},\frac{4}{27},\frac{5}{27},\frac{8}{27}$

$S^{\prime\prime\prime\prime\prime}$: $\frac{8}{27},\frac{8}{27},\frac{9}{27}$

$S^{\prime\prime\prime\prime\prime\prime}$: $\frac{9}{27},\frac{16}{27}$

$S^{\prime\prime\prime\prime\prime\prime\prime}$: $1$

We can now compute the average word length in the Huffman code – by simply adding the probabilities of the newly created ‘joint’ symbols:

$= \frac{3}{27} + \frac{4}{27} + \frac{5}{27} + \frac{8}{27} + \frac{9}{27} + \frac{16}{27} + 1 = \frac{72}{27}$

Which obviously looks like the opposite of compression! But, of course we are encoding three symbols, so the actual word length per symbol becomes $\frac{72}{81} = 0.88888888888....$, in other words somewhat less than 1.