Tagged: Binomial distribution

More on error correction


Considering the chances of a decoding error (ie having more errors than our error correction code can handle)…

P(no errors) = (1 -p)^n where p is the probability of a bit flip and n the length of the code.

So in our case that gives P(no errors) = (0.9)^4 = 0.6561

But we can also work out the possibility of k bit flips, using the binomial distribution:

P(k_{errors}) = _nC_k p^k (1-p)^{n-k}

So what are the prospects of a decoding error. This is the lower bound (only the lower bound because – as the table in the previous post showed – some errors might be detected and some not for a given Hamming distance):

P(total errors) = \sum_{k = d}^n _nC_kp^k(1-p)^{n-k}

For us d=4, n=4, so therefore as 0! = 1, the lower bound in our case is 0.1^4 (0.9)^{4 - 4} = 0.0001 which isn’t bad even for such a noisy channel.

But what is the guaranteed success rate?

Here we are looking at:

\sum_{k=0}^{\lfloor\frac{d - 1}{2}\rfloor} _nC_k p^k (1 - p)^{n - k}

(Recalling d \geq 2v +1 for v bits of error correction)

In our case this gives:

_4C_0 p^0 (1 -p)^4 + _4C_1 p^1 (1 -p)^3 = 0.6561 + 0.2916 = 0.9477

This shows the power of error correction – even though there is a 10% chance of an individual bit flipping, we can actually keep the error down to just over 5%.

The Gaussian distribution


When you hear the term “bell curve” what you are actually listening to is a discussion of the “normal” or “Gaussian” distribution.

This is a probability density function (PDF) of the form:

\textit{f}(x; \mu,\sigma^2) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}

Here \mu is the mean or expectation (peak) and \sigma^2 is the variance (\sigma is the standard deviation) and of course e is Euler’s number (the base for natural logarithms).

But what does it all mean in a physical sense? Where does it come from?

Let’s look at variance first. This measures the spread of the function.

Take an example of a coin tossed four times. Assuming it is a fair coin then we should expect to get 2 heads. But, of course, the process is random so is likely to deviate from that. So variance being the square of deviation, what will that be?

There a \frac{1}{2}^4 = \frac{1}{16} chance of four heads (or four tails), a \frac{4}{16} chance of one head (or three heads, ie 1 tail) and a \frac{6}{16} of 2 heads.

So the variance then becomes:

\frac{1}{16}(0 - 2)^2 + \frac{4}{16}(1 - 2)^2 + \frac{6}{16}(2 - 2)^2 + \frac{4}{16}(3 - 2)^2 + \frac{1}{16}(4 - 2)^2

Which comes out as 1.

The normal distribution is a limiting case of the binomial distribution – which looks at success/fail type discrete variables (of course the coin example above is just such a case.) In the binomial distribution has a PDF of the form:

f(k; N,p) = _NC_k p^kq^{(N-k)}

Where p is the probability of an event happening and q = 1 - p is the probability of it not happening, and where _NC_k is the binomial coefficient and can be spoken of as “N choose k” and is the number of ways of distributing k successes from N trials.

_NC_k = \frac{N!}{k!(N-k)!}

Consider the case where N becomes large…

Here the change of success is p and the chance of failure is (1 – p), so the average result of each test is p + 0 \times (1 - p), so the mean \mu = Np

The variance  for each test is p(1 - p)^2 + (1 - p)(0 - p)^2 = (1 -p)(p - p^2 + p^2) = p(1-p) so the total variance is Np(1-p) .

Now, let’s look at the cumulative distribution function: this is the probability that the result will be less than or equal to x , ie P_r(X \leq x) .

For integer results:

P_r(X \leq x) = \sum_{k = 0}^x _NC_k p^kq^{(N-k)}

(for non-integer results we need to use the floor function for x , \lfloor x \rfloor )

Stirling’s approximation


Trying to find a way to calculate the factorials of large (very large) numbers, so as to at least work through my example for Uranium 235 that I considered when working out, for my own benefit, how the binomial distribution worked.

Got this via Twitter:

In fact, wikipedia states the approximation in a slightly less approximate way:

n! \sim \sqrt{2\pi n} (\frac{n}{e})^n

The binomial distribution, part 2


Binomial probability mass function.

Image via Wikipedia

(Part 1 is here – these notes are to assist me, rather than contain any real news!)

So, if the probability that an event will happen to a single entity in a unit of time is p and the probability it will not happen is 1 - p = q , what is the probability that a large number of events, k , will take place?

Considering the Uranium-235 example again, lets say there are a very large number, N of atoms: how many will decay and emit an \alpha -particle in a given second?

Well we know what the mean number of decays we should expect is: Np . But this is a random process and so will not always deliver that result, instead there will be a distribution of results around this mean.

What does the this distribution look like – ie., what is its probability mass function (pmf)?

For exactly k decays let’s call this f(k; N,p) .

To show where the pmf comes from, let’s look at a much simpler example – tossing a coin four times and seeing if we get exactly one head, ie k=1, N=4 .

One way we could get this is like this: HTTT. The probability of that happening is p^kq^{N-k} = pq^3 = \frac{1}{16} . But that is not the only way exactly one head could be delivered: obviously there are four ways: HTTT, THTT, TTHT, TTTH and so the probability of exactly one head is \frac{4}{16} .

(For two heads we have six ways of arranging the outcome: HHTT, HTHT, HTTH, THHT, THTH, TTHH and so the probability is \frac{6}{16} . For three heads the probability is the same as three tails (ie for one head), and the probabilities for all heads and all tails are both \frac {1}{16} . Cumulatively this covers all the possibilities and adds up to 1 .)

The generalisation of this gives us a pmf thus: f(k; N,p) = _NC_k\ p^kq^{(N-k)}, where _NC_k is the binomial coefficient and can be spoken as “N choose k” and is the number of ways of distributing k successes from N trials.

_NC_k = \frac{N!}{k!(N-k)!}

There are approximately 10^{21} Uranium atoms in a gramme of the substance and calculating factorals of such large numbers efficiently requires an awful lot of computing power – my GNU calculator has been at it for some time now, maxing out one CPU on this box for the last 14 minutes, so I guess I am going to have to pass on my hopes of showing you some of the odds.

The binomial distribution, part 1


Lognormal

Image via Wikipedia

I think there are now going to be a few posts here which essentially are about me rediscovering some A level maths probability theory and writing it down as an aid to memory.

All of this is related as to whether the length of time pages are part of the working set is governed by a stochastic (probabilistic) process or a deterministic process. Why does it matter? Well, if the process was stochastic then in low memory situations a first-in, first-out approach, or simple single queue LRU approach to page replacement might work well in comparison to the 2Q LRU approach currently in use. It is an idea that is worth a little exploring, anyway.

So, now the first maths aide memoire – simple random/probabilistic processes are binomial – something happens or it does not. If the probability of it happening in a unit time period is p (update: is this showing up as ‘nm’? It’s meant to be ‘p’!) then the probability it will not happen is 1 - p = q .  For instance this might be the probability that an atom of Uranium 235 shows \alpha -particle decay (the probability that one U 235 atom will decay is given by its half-life of 700 million years ie., 2.21\times10^{16} seconds, or a probability, if my maths is correct, of a given individual atom decaying in any particular second of approximately 4.4\times10^{-16} .

(In operating systems terms my thinking is that if the time pages spent in a working set were governed by similar processes then there will be a half life for every page that is read in. If we discarded pages after they were in the system after such a half life, or better yet some multiple of the half life, then we could have a simpler page replacement system – we would not need to use a CLOCK algorithm, just record the time a page entered the system and stick it in a FIFO queue and discard it when the entry time was more than a half life ago.

An even simpler case might be to just discard pages once the stored number reached above a certain ‘half life’ limit. Crude, certainly, but maybe the simplicity might compensate for the lack of sophistication.

Such a system would not work very well for a general/desktop operating system – as the graph for the MySQL daemon referred to in the previous blog shows, even one application could seem to show different distributions of working set sizes. But what if you had a specialist system where the OS only ran one application – then tuning might work: perhaps that could even apply to mass electionics devices, such as Android phones – after all the Android (Dalvik) VM is what is being run each time.)