I suppose this ought to be obvious, but I had not really thought about it before now…

The Hamming distance between two bit streams is the number of bits that must be flipped to make one stream equal to the other. In error detection and error correction, the bigger the Hamming distance, the better the detection or correction.

But, of course, the Hamming distance is not the same as the arithmetic difference. Fairly obviously `0111`

and `1000`

have an arithmetic difference of just 1, but have a Hamming distance of 4.

Let me try and illustrate a practical example. We encode 0 and 1 in 4 bits. The chance of a bit flipping due to noise or other errors is 10% (and so not flipping is 90%). What are the best codes to use?

Source word | “1000” | “0001” | “1100” | “0011” | “1000” | “0111” | “0000” | “1111” | ||||

Output word | ||||||||||||

“0000” | 0.0729 | 0.0729 | 0.0081 | 0.0081 | 0.0729 | 0.0009 | 0.6561 | 0.0001 | ||||

“0001” | 0.0081 | 0.6561 | 0.0009 | 0.0729 | 0.0081 | 0.0081 | 0.0729 | 0.0009 | ||||

“0010” | 0.0081 | 0.0081 | 0.0009 | 0.0729 | 0.0081 | 0.0081 | 0.0729 | 0.0009 | ||||

“0011” | 0.0009 | 0.0729 | 0.0001 | 0.6561 | 0.0009 | 0.0729 | 0.0081 | 0.0081 | ||||

“0100” | 0.0081 | 0.0081 | 0.0729 | 0.0009 | 0.0081 | 0.0081 | 0.0729 | 0.0009 | ||||

“0101” | 0.0009 | 0.0729 | 0.0081 | 0.0081 | 0.0009 | 0.0729 | 0.0081 | 0.0081 | ||||

“0110” | 0.0009 | 0.0009 | 0.0081 | 0.0081 | 0.0009 | 0.0729 | 0.0081 | 0.0081 | ||||

“0111” | 0.0001 | 0.0081 | 0.0009 | 0.0729 | 0.0001 | 0.6561 | 0.0009 | 0.0729 | ||||

“1000” | 0.6561 | 0.0001 | 0.0729 | 0.0009 | 0.6561 | 0.0001 | 0.0729 | 0.0009 | ||||

“1001” | 0.0729 | 0.0729 | 0.0081 | 0.0081 | 0.0729 | 0.0009 | 0.0081 | 0.0081 | ||||

“1010” | 0.0729 | 0.0001 | 0.0081 | 0.0081 | 0.0729 | 0.0009 | 0.0081 | 0.0081 | ||||

“1011” | 0.0081 | 0.0081 | 0.0009 | 0.0729 | 0.0081 | 0.0081 | 0.0009 | 0.0729 | ||||

“1100” | 0.0729 | 0.0009 | 0.6561 | 0.0001 | 0.0729 | 0.0009 | 0.0081 | 0.0081 | ||||

“1101” | 0.0081 | 0.0081 | 0.0729 | 0.0009 | 0.0081 | 0.0081 | 0.0009 | 0.0729 | ||||

“1110” | 0.0081 | 0.0001 | 0.0729 | 0.0009 | 0.0081 | 0.0081 | 0.0009 | 0.0729 | ||||

“1111” | 0.0009 | 0.0009 | 0.0081 | 0.0081 | 0.0009 | 0.0729 | 0.0001 | 0.6561 |

Hopefully the table above (slightly squashed – but you should read pairs of source words across the top and then look at the likelihood of their reception in a particular form downwards) shows the differences between Hamming distance and arithmetic difference: choosing `1000`

and `0001`

as the codes gives poor results – there being 8 “collisions” between the hashes – and what’s worse two of them occurring on just a single bit switch – something much more likely to happen that a double or triple bit switch. The problem is that the two codes are just a Hamming distance of 2 apart.

But then, look at the last three coding schemes. These are of various arithmetic distances – one being based on successive numbers – but as they have the same Hamming distance (4) they give the same result – six collisions, though none on a single bit.

These comparisons also illustrate another aspect of block encoding based error correction and detection. All the codes successfully allow for 1 bit error detection – as even the Hamming distance for the poorly chosen codes is big enough (for *u* bit error detection, the Hamming distance must be ). This of course means that the last three pairs can detect up to three bits of error.

But error correction is much more limited – the hash collisions mean that we can only be sure of correcting 1 bit errors: this is because to correct *v* bits of error the Hamming distance must be .

###### Related articles

- Ham It Up (drdobbs.com)
- Convolutional coding (cartesianproduct.wordpress.com)
- The Central Limit Theorem Makes Random Testing Hard (regehr.org)

Pingback: More on error correction « cartesian product