The maths of all this are deceptively simple, but are of fundamental importance in information theory.
First of all, we can think of what sort of information events convey. There is a famous aphorism about the habits of bears, the direction from which the Sun rises and the title of the head of the Catholic Church. No newspaper or TV bulletin would ever bother to tell us, on a daily basis, that the Sun has indeed risen from the East. In effect that event conveys no information because it is completely expected. Toss a coin, though, and we do not know the outcome in advance – the result is of interest because the outcome is not known in advance.
Hence we can posit an information function which tells us how much information we can derive from event , or, if we think of symbols in a data stream we can call this – the amount of information we can derive if the next symbol we see is .
From our earlier discussion we can see that the result of this function is dependent on the probability of the event occurring or the symbol being received – i.e. if we knew the Sun was always going to rise in the East we can derive no information from that very thing happening. Hence:
is a function of where is the probability that the next symbol to be seen is . As we will only consider ‘memoryless’ events we will also insist that is independent of time or any previously seen symbol. Hence:
(i.e. the information we can gain from seeing the symbol followed by the symbol is
Taking these points together we can define
We can see that tends to infinity if the probability of an event tends to zero (think of the Sun rising in the West!) and tends to 0 if nothing unexpected happens.
The entropy of a system (communication) then becomes:
- Measuring entropy for a table (e.g., SQL results) (cs.stackexchange.com)
- Evolution, Entropy, and Information (blogs.discovermagazine.com)
- The link between information and entropy on Azimuth (thefinchandpea.com)
- What’s cooler than information and entropy? (thefinchandpea.com)