Update: I had another look at this. Actually, in retrospect, the substance of the lecture notes are not wrong, but they are worded for electronic engineers rather than computer scientists or mathematicians
Normally one would present the process of creating a code word in the form:
Where is the code word, is the message to be coded and the generator matrix. The lecture notes present the generator matrix in a systematic form which would be the equivalent of:
, where is the identity matrix.
Here’s a quandry: you are researching “Cyclic Redundancy Codes” (CRCs) – mainly for finding a clearer way to explain them, when you come across some lecture notes from an academic at one of the bigger universities in Western Europe which, initially at least, look like a model of clear exposition. But as you read on, you realise with growing incredulity that what the lecturer describes as CRCs are not CRCs at all but general linear block codes. Now, the principles are similar – in that the lecturer discusses generator matrices and parity check matrices, but the details are hopelessly wrong. So, do you (a) denounce the academic in public, or (b) contact them in private or (c) post a ‘blind’ blog and wonder about what value students and taxpayers are getting for their money?
But now, thanks to another book – Networks on Chips – here’s a slightly more fundamental examination.
CRCs are a form of Cyclic codes – codes where a rotation – eg., make the 0 bit the highest order bit, the previously highest order bit the second highest order bit and so on – produces a new codeword.
As before we can think of code words as polynomials of the power of 2 with coefficients of 0 or 1, eg.,
(And of course any natural number can be represented in this fashion.)
Then a circular shift is similar to multiplying by 2 – though with the proviso that
We can construct a codeword in the form where is the message, is the (monic) generator and a remainder (of the same or lesser order as ).
But if we stipulate that is of the smallest order of a codeword then that means must be zero.
So, what does a generator matrix look like?
If we have a monic generator polynomial of degree and a code word length of :
- Deriving an Incremental Form of the Polynomial Regression Equations (erikerlandson.github.com)
- A smattering of MyForth: CRC16 checksums (toddbot.blogspot.com)
- Selecting a Checksum algorithm (fastcompression.blogspot.com)
- Parity checks as matrix multiplication (cartesianproduct.wordpress.com)
This, dismally crude (sorry), drawing shows how it is done for the simplest type of CRC – a parity bit. This arrangement calculates an even parity bit (ie it ensures that the bit stream transmitted always contains an even number of ones).
As this is a CRC the parity bit is originally set to 0. So for input 1011 we have:
(input bit) 1 XOR (parity bit) 0 = (final state of parity bit) 1
0 XOR 1 = 1
1 XOR 1 = 0
1 XOR 0 = 1
(so in this case the bit is set to 1)
And for 100111 we have:
1 XOR 0 = 1
0 XOR 1 = 1
0 XOR 1 = 1
1 XOR 1 = 0
1 XOR 0 = 1
1 XOR 1 = 0
So, as the input already has an even number of bits then the parity bit would be set to 0.
This is the equivalent of a CRC generator polynomial of . A parity bit will catch the switching of one (or any odd number) of bits but will fail to detect errors for any even number of bit switches, so it is of some, but limited, use.
For bigger generator polynomials a similar arrangement is used – with more bits where the parity register is (see Computer Networks and Internets).
If you have ever downloaded any software from the internet you are quite likely to have seen a reference to a “CRC” – a cyclic redundancy check.
I have been reading about them as I plough on through Andrew Tanenbaum‘s (pictured) Computer Networks: so here is an explanation based closely on his text (though it’s not a copy except where indicated)…
We say that a is a list of coefficients for a polynomial with terms ie for a polynomial ranging from to (which is said to have ).
So represents .
Polynomial arithmetic is a form of modulo-2 arithmetic “according to the rules of algebraic field theory” (that’s what it says in the book!). In practise what this means is that there are no “carry-overs” and addition and subtraction are identical and the same as exclusive or (XOR).
Hence while and so on…
To use a CRC a sender and receiver must agree a generator polynomial: in fact these are now generally standardised and IEEE 802 (network protocols) defines:
(the last two terms are of course equivalent to ).
1. Let us call this generator polynomial and let represent the polynomial (ie the sequence of 0s and 1s) we wish to transmit (with bits) – must have more bits than .
2. Let r be the degree of – so we append r zero bits to the lower end of the frame, which now has bits and is (or ).
3. Divide by using modulo 2 division.
4. Subtract the remainder (which will always be r or fewer bits (as is of degree r) from the string (again using modulo 2 subtraction). This gives the checksummed frame to be transmitted – we shall call this .
If the frame is transmitted correctly then when it is received it should be cleanly divisible by (as taking away the remainder in step 4 should leave us with a cleanly divisible frame). The additional checksum is merely the additional bits at the end.
The power of the checksum relies on some of its mathematical properties – it will always detect single bit errors if contains more than one term – as the bad frame is now where is the error but in this case where is the “bad bit” but with a two-termed , for any .
Similarly, if there are two errors then so if does not cleanly divide where can be any value from 1 to the maximum of (ie the frame length) then any two bit error would be detected.
Similarly, if there are an odd number of errors then cannot be divisible by a that contains the terms . We can show by a reductio ad absurdum proof:
1. We know that an must be capable of evaluating to one: eg must evaluate to 1 if . But if is divisible by then , but if then, modulo 2, so .
To really fool the system the error pattern would have to match which is, assuming all transmission errors are equally likely (in fact they are not, but we will ignore that for now), has a probability (again, the book says) of which is pretty small for the IEEE picked . (The GNU calculator says it is zero – so it’s too small to register there!)
- Recovering Outlook PST File Corruption (brighthub.com)
- How error in binary data can be detected (wiki.answers.com)
- Fourier Series: before I begin… (cartesianproduct.wordpress.com)
- Pretty pictures of sets of roots (gasstationwithoutpumps.wordpress.com)
- Applications of Complex Number Analysis to Divisibility Problems : Two Undergrad Problems (wpgaurav.wordpress.com)
- Cyclic Subgroups (rkneufeld.wordpress.com)