## Two different ways to calculate r-ary Huffman Code

Calculating a binary Huffman code is easy: amalgamate the symbols in the source, two-by-two, and then reverse it to get the code.

But what of an r-ary Huffman code, of arbitary $r$.

Text books give two “different” methods which puzzled me at first until (not too much, I admit) thinking convinced me they were the same.

In a normal Huffman ‘compression’ we might have a source $S$ of $q$ elements, for a code of base $r$ we would reduce $S$ to $S^{\prime}$ with $q - (r - 1)$ elements by merging $r$ symbols in $S$.

To do this properly we need to have $r$ symbols left for the final merge. To do this then, if we start with $N$ symbols, then $1 \equiv N mod(r - 1)$.

The difference between the two methods suggested in text books is either positing additional symbols of probability zero such that $1 \equiv N mod(r-1)$ or ensuring the first merger is of only $q$ symbols where $(N-q)$ $mod(r-1) = 0$.

Both, are of course, functionally similar. For instance, with $N = 8$ and $r = 4$, then $N mod(r -1) = 2$, so we can begin by adding two symbols of probability zero or the first reduction should be of size $s$ such that $(N-s)$ $mod(r-1) = 0$ – and we can see that, in this case, $s = 2$.