Struggling with coin tossing paradox

The method of Eratosthenes used to sieve out p...
The method of Eratosthenes used to sieve out prime numbers is employed in this proof. (Photo credit: Wikipedia)

I was led to this by the discussion on the non-random nature of prime numbers – as apparently it inspired one of the authors of the paper that noted this. I am struggling a bit with the maths of this, so hopefully writing it out might either help me grasp what is wrong with my exposition or else somebody will explain to me what I am doing wrong.

The paradox, taking H for heads and T for tails, is this – if you have a coin and toss it twice there is an equal probability (0.25) that you will see the sequence HH or HT.

But if you have two coins and toss them then, on average, it will take six tosses before you see HH but ocannly four to see HT.

I have no problem with the HT sequence – inside four tosses you can reach this via:


which a simple sum of probabilities shows comes to: \frac{4}{16} + \frac{2}{16} + \frac{1}{16} + \frac{2}{16} + \frac{1}{16} + \frac{1}{16} = \frac{11}{16}, greater than half, while just three tosses would give a probability of 0.5.

So, the issue is with the HH combination. With just five tosses we can get:


Which sums to: \frac{8}{32} + \frac{2}{32} + \frac{1}{32} + \frac{4}{32} + \frac{2}{32} + \frac{1}{32} + \frac{1}{32} = \frac{19}{32} – i.e., more than half (four tosses give us odds of 0.5).

So where is the flaw in my reasoning?


4 responses to “Struggling with coin tossing paradox”

  1. You’re not computing a mean of the number of tosses required. What you have in your partial reckoning of the HH case is (8/32)(2) + (2/32)(4) + (1/32)(5) + … = 59/32 ~= 1.84; but that’s only a portion of the mean number of tosses required to see HH.

    One way to derive the expected value of six tosses is to let x_H and x_T be the expected number of tosses needed to win given that the last coin tossed was heads or tails respectively. (We’ll assume the starting state follows a “zeroth” tails result.) This leads to the equations

    x_H = (1/2)(1) + (1/4)(2 + x_H) + (1/4)(2 + x_T)
    x_T = 2 + (1/4)(0) + (1/4)(x_H) + (1/2)(x_T).

    Solve simultaneously, and you find x_T = 6.

  2. Thanks, as always. I’m still not sure I get it, but I suspect that’s because I’m a bit stupid. Let me think about it some more.
    What does the probability tell us then?

    1. I don’t think it’s you being stupid, I think it’s probability theory being unintuitive at times. I just did a blog post on it (the same observation having come up in a recent discussion in this hemisphere). I’m not sure it will be helpful, but perhaps.

    2. Two more thoughts on this … First, your enumeration of short sequences leading to success, together with their probabilities, gives you an idea of what the _median_ number of tosses required to find the target pattern might be. The distribution of the required number of tosses is skewed, though; you cannot get by with fewer than two, but with a sufficiently annoying coin you could require an arbitrarily large number. So the mean is greater than the median (for both patterns), but by how much is unclear.

      Second, I can offer an intuition for the direction of the discrepancy in means, though not for the exact number (six versus four). Assume we toss a single coin repeatedly until the sequence yields our target pattern (HH or HT). If the last result was T (or if we are just starting), we need a minimum of two tosses to match either target. If the last result was H, we have a 50-50 chance of matching either target with one more toss. So far, the two targets behave the same. The difference lies in the case that we last saw a head and failed to match the pattern on the following toss. If the pattern is HH, failure to match means we tossed a tail, and now we need a minimum of two more tosses. If the pattern is HT, we are coming from H, and we failed to match, our latest toss was another H, and we still have a 50-50 chance of needing only a single additional toss.

%d bloggers like this: