Struggling with coin tossing paradox

The method of Eratosthenes used to sieve out p...
The method of Eratosthenes used to sieve out prime numbers is employed in this proof. (Photo credit: Wikipedia)

I was led to this by the discussion on the non-random nature of prime numbers – as apparently it inspired one of the authors of the paper that noted this. I am struggling a bit with the maths of this, so hopefully writing it out might either help me grasp what is wrong with my exposition or else somebody will explain to me what I am doing wrong.

The paradox, taking H for heads and T for tails, is this – if you have a coin and toss it twice there is an equal probability (0.25) that you will see the sequence HH or HT.

But if you have two coins and toss them then, on average, it will take six tosses before you see HH but ocannly four to see HT.

I have no problem with the HT sequence – inside four tosses you can reach this via:


which a simple sum of probabilities shows comes to: \frac{4}{16} + \frac{2}{16} + \frac{1}{16} + \frac{2}{16} + \frac{1}{16} + \frac{1}{16} = \frac{11}{16}, greater than half, while just three tosses would give a probability of 0.5.

So, the issue is with the HH combination. With just five tosses we can get:


Which sums to: \frac{8}{32} + \frac{2}{32} + \frac{1}{32} + \frac{4}{32} + \frac{2}{32} + \frac{1}{32} + \frac{1}{32} = \frac{19}{32} – i.e., more than half (four tosses give us odds of 0.5).

So where is the flaw in my reasoning?

Is dark matter locked up in primordial black holes?

Dark matter pie
Dark matter pie (Photo credit: Wikipedia)

To be honest, I have an issue with both “dark matter” and “dark energy” – they both look like close-to-metaphysical constructs to me: we have a hole where theory and observation do not match so we’ll invent this latter-day phlogiston and call it “dark”.

Then again, I’m not really qualified to comment and it is pretty clear that observations point to missing mass and energy.

I have another issue – if most of the mass of the universe is in the “dark matter” why is there no obvious evidence of it nearby? I don’t mean why can’t we see it – as obviously that is the point – but even though we sit in an area (Earth) of local gravitational field maximum we are struggling to see any local mass effects. (For instance this paper talks about how “local” conditions should impact any dark matter wind but, as far as I can see at least, it’s all entirely theoretical: we haven’t seen any dark matter wind at all.)

So the suggestion – reported in the New Scientist – and outlined in this paper – that actually dark matter is locked up in black holes caused by sound wave compression in the earliest moments of the universe has an appeal. It also potentially fits with the observational evidence that dark matter appears to be knocking around in the halos of galaxies.

These primordial black holes are not a new concept – Bernard Carr and Stephen Hawking wrote about them in 1974 (in this paper). The new evidence for their potential as stores of dark matter comes from the already famous Laser Interferometer Gravitational-Wave Observatory (LIGO) experiment – as that leaves open the prospect that the two black holes that generated the detected gravitational waves could both be in a galactic halo and of the right mass spectrum to suggest they and similar bodies accounted for the gravitational pull we see from dark matter.

All this is quite speculative – the paper points the way to a new generation of experiments rather than proclaims an epoch making discovery, but it’s obviously very interesting and also suggests that the long search for WIMPS – the hypothesised weakly interacting particles that have previously been the favourites as an explanation for dark matter – has essentially been in vain.

Primes are not random

An article in the New Scientist – here – discusses this paper on arxiv on the non-random distribution of the prime numbers.

So what? After all, we know that primes are not randomly distributed and become increasingly less common as we progress up the “number line” – with the Prime Number Theorem telling us that the number of primes up to or less than x, being \pi(x) \sim \frac{x}{ln(x)}.

(This is also takes us to perhaps the best known paradoxes of infinity – even though we know that primes become increasingly rare as we count higher, we also know there are an infinite number of primes: if we had a maximum prime P then if we multiplied all the primes up to an including P and added 1, we would have a number that was not divisible by any prime: a contradiction which demonstrates there can be no maximum prime and hence there must be an infinite number of primes.)

But the lack of randomness here is something deeper – there is a connection between successive primes, something rather more unexpected.

For any prime greater than 5 counted in base 10 we know it must end in either 1, 3, 7 or 9. The even endings (0, 2, 4, 6, 8) are all divisible by 2, while those that end in 5 are also divisible by it (this is what made learning the five times tables so easy for me in Mrs McManus’s P3 class forty-three years ago – the girls on the street loved a skipping rhyme and all I had to do was sing it in my head: five, ten, fifteen, twenty…).

The thinking was that these endings would be randomly distributed, but they are not (a simple result to demonstrate, so it is a sign that not all interesting maths research is yet beyond us mere mortals.)

To simplify things a bit, though, I am (as does the arxiv paper) going to look at the residues the primes leave in modulo 3 counting. Essentially this just means the remainder we get when divide the prime by three (there has to be a remainder, of course, because otherwise the number would not be prime). As is obvious these have to be 1s or 2s – e.g., 5 has a residue of 2, 7 a residue of 1 and so on.

What we are really interested in is the sequence of these residues. If successive primes were wholly independent then we would expect these sequences – i.e., (1, 1) or (1, 2) or (2, 1) or (2, 2) to all be equally likely. Using mathematical notation we describe the frequency of these sequences like this: \pi(x_0; 3, (1, 1)) for the (1, 1) sequence and so on.

For the first million primes we would expect each sequence to have 250,000 occurrences or something very close to that. But we don’t see that at all:

\pi(x_0; 3, (1,1)) = 215,837
\pi(x_0; 3, (1,2)) = 283,957
\pi(x_0; 3, (2,1)) = 283,957
\pi(x_0; 3, (2,2)) = 216,213

Using a list of primes I found online and some C++ and R (listed at the end if you want to mess about with it yourself), I decided to using a little bit of Bayesian inference to see how the non-random the primes are.

I treated the sequences as though they were coin tosses – if we get a switch (1, 2) or (2, 1) then that is the equivalent of a head, while if we get a recurrence (1, 1) or (2, 2) then we have had the equivalent of flipping a tail. If the coin was “fair” then for a 1000 flips we’d get 500 of each type. The result for the first 1000000 primes, though, suggests we’d actually get 568 heads.

(I discussed Bayesian testing for coin fairness some time ago – here.)

With Bayesian inference we can posit a “prior” –   a guess, educated or otherwise – as to how fair the coin we are testing might be. If we know nothing we might start with a prior that assigned equal likelihoods to all results, or we might, in this case, go for a (in retrospect naive) view that the most likely outcome is fairness and that subsequent primes are independent of those that went before.

The first graph – below – is based on saying we are ignorant of the likely result and so assigns an equal likelihood to each result. (A probability of 1 indicates absolute certainty that this is the only possible outcome, while a probability of 0 indicates there is literally zero chance of this happening – it should be noted that using Bayesian inference means we are looking at a prediction for all primes, not simply the results of the first 1000 we test.)

First thousand primes

This looks at the first thousand primes and predicts we’d get something like 650 – 660 heads (if you look closely you’ll see that even as the probability density graph becomes narrower it also moves slowly to the left.)

The second graph does the same but posits a different prior – one where the initial guess is that, quite strongly, the “coin” is a fair one. That makes a big difference for the first few primes but it does not take long before the two graphs look exactly the same.

with strongly fair prior

So does this lack of randomness indicate a shocking and previously undetected “hidden hand” in number theory? It seems not (I don’t claim to follow the maths of the paper at this point): the authors explain can explain the pattern using what we already know about primes and also point out that, as we further extend our examination we find the result is increasingly “fair” – in other words, the probability density spike would creep ever closer to 500 in the graphs above. (This also shows that even quite large sample sizes are relatively poor guides to the final result if the sampling is not truly random, but that is another matter entirely).

This is a reworked and consolidated post from what were the two previous posts on this subject.

Here’s the code

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <vector>
#include <cmath>

using namespace std;

void normalise(vector<long double>& input)
	long double sum = 0;		
	for (auto x: input) {
		sum += abs(x/sizeof(input));
	long double factor = 1/sum;
	for (auto& x: input) {
		x = x * factor;
	auto biggest = max_element(input.begin(), input.end());
	if (*biggest > 1) {
		factor = 1/(*biggest);	
		for (auto& x: input) {
			x = x * factor;

int main()
	vector<int> primes;
	//read in the primes file
	ifstream inputPrimes("/Users/adrian/Downloads/P-1000000.txt");
	//get the primes
	string rawPrimeLine;
	while (getline(inputPrimes, rawPrimeLine)) {
		string ordinal;
		string prime;
		istringstream stringy(rawPrimeLine);
		getline(stringy, ordinal, ',');
		getline(stringy, prime, ',');

	//calculate bayes inference
	ofstream bayesOut("/Users/adrian/bayesout.csv");
	vector<long double> distribution;
	for (int i = 0; i <= 1000; i++) { 
                distribution.push_back(0.0001 + pow(500 - abs(i - 500), 4));
                // distribution.push_back(0.001);
        int last = 1; //7%3 = 1 
        int count = 2;
        int heads = 1;
        //heads = switch
        //tails = no switch
        //0 = all tails, 1 = all heads 
        for (auto p: primes) {
                if (p > 7 && count < 1010) {
                        int residue = p%3;
                        long double hVal = 0.0;
                        if (residue != last) {
                        for (auto& x: distribution) {
                                x *= pow(hVal, heads) * pow(1 - hVal, count - heads);
                                hVal += 0.001;
                        last = residue;
                        int numb = 0;
                        for (auto x: distribution) {
                               if (numb++ > 0) {
					bayesOut << ",";
				bayesOut << x;
			bayesOut << endl;
#!/usr/bin/env Rscript

optos = ani.options(nmax=1000)
for (i in 1:ani.options("nmax")) {
	matplot(t(w2)[,i], ylim=c(0, 1), axes=FALSE, ylab="Probability", xlab=c("Outcome: ", i), type="l")
	axis(side = 1, at = c(0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000))
	axis(side = 2, at = c(0, 1))
}}, interval=0.15,"primes.gif", ani.width=400, ani.height=400)

The opportunities of AI

Embed from Getty Images

A computer program – AlphaGo – has now beaten the world’s greatest Go player and another threshold for AI has been passed.

As a result a lot of media commentary is focusing on threats – threats to jobs, threats to human security and so on.

But what about the opportunities? Actually, eliminating the necessity of human labour is a good thing – if we ensure the benefits of that are fairly distributed.

The issue is not artificial intelligence – a computer is just another machine and there is nothing magic about “intelligence” in any case (I very much follow Alan Turing in this). The issue is how humans organise their societies.