Some people – Yes supporters essentially – are claiming that it is plain that the opinion polls – none of which (so far, at least – I hope I am not tempting fate) have reported a Yes lead – in the Scottish independence referendum are rigged is because they have never been asked.

Well, there is a simple reason for that: polls are small and the electorate is very large.

If we assume every elector has an equally random chance of being asked (which is not true for many cases: if you are not on an online panel it just won’t happen), and that each poll asks 1200 electors then the chances of you being asked in any given poll are 1200/4000000 or about 1 in 3,333: a bit better than winning the lottery jackpot I’d admit, but who bets on a 3332/1 chance?

Of course, though, there are multiple polls but to have just a 1 in 100 chance of being asked then 33 polls would have to be taken. To make it more likely than unlikely that you had been polled then 1667 polls would have to be taken.

What Scotland Thinks, at the time of writing, records 80 polls on the referendum question – so the chances of any individual elector being asked are (given all my approximations) about 1 in 42, or in bookies’ odds terms, it’s a 41/1 shot.

If you think a race is fixed because your 41/1 wager never comes home, I’d suggest you weren’t to be trusted in a betting shop.

Update: Should make it clear this is a pretty crude approximation to make a point – opinion poll sample sizes vary and if they are closer to 1000 in sample size then the odds of you being asked go up to about 49/1 (ie., it’s a fair bit less likely).

A further update: My intention on writing this was to demonstrate, in the broad brush terms why an argument based “I have never been polled so therefore the polls are wrong” didn’t hold any water. It seems the article now being touted around as an exact prediction of how likely it was you’d been asked: it’s not. As I say above much (most probably) polling these days is via online panel – if you are not on the panel you cannot be asked to begin with.

In the year ahead one of the biggest – probably the biggest – political stories in the UK will be the September referendum on whether Scotland should leave the UK.

I am not going to comment here on what I hope the outcome will be – other than to say I hope and believe there will be a strong ‘no’ vote.

But I am going to take issue with how the campaign is reported and, in particular, the dismal way in which opinion polls are covered.

My ire has been provoked by a claim by columnist in today’s Scotsman that a 1% change in one side’s support between two polls in September and December indicates the race is “tightening”.

My argument is that it indicates nothing of the sort. The two polls are essentially mathematically identical. I realise that “things just the same” does not, as a headline, sell many papers, but it does not make it acceptable to invent new mathematical facts where none exist. The fact that opinion polls today essentially show the same result as opinion polls of two months ago and – in this case – two years ago and twenty years ago – may be a journalistic disappointment, but it is also the reality.

So here is my brief guide to the mathematics of opinion polls. If you want to know more I strongly recommend the classic Statistics without Tears: An Introduction for Non-Mathematicians which, as the subtitle suggests, gives the reader a clear grounding with requiring a lot of maths knowledge.

I will begin with a few ground rules…

Firstly, remember what a poll is based on: not the truth about people’s opinions but what they say their opinions are. If some people systematically lie to pollsters (as, in certain cases, it is known they do because they might be afraid or ashamed to tell the truth) then your poll is flawed from the start. And the best you can say of any poll’s accuracy is that it is as good as the best poll can be.

Secondly, the best we can say about a poll is that, if conducted properly, it has a given degree of accuracy compared to any other poll. So when people talk of a “margin of error” in a poll, what they typically really mean is that 95% of all properly conducted polls will give an answer within that margin of error. (This is both an amplification of the first point but also completely independent of it – if people lie then they will likely lie to all pollsters and so no polls are immune.)

Thirdly, it is a mathematical fact that for even the best conducted polls, we should expect one in twenty to give results outside that “margin of error” – this isn’t because we can expect pollsters to mess it up one time in twenty, but because of the mathematical rules of the universe in which we live. It is an unavoidable feature of opinion polling. And because it is unavoidable we do not know which of the polls is the “rogue” and whether any seeming shift (or non-shift, remember) is because of this “rogue” effect or because of a real change in what people are likely to say to opinion pollsters.

And now a little bit of maths…

Claims about polling accuracy are based on the fact that opinion poll results (surveys of a small part of the population from which we hope to draw conclusions about the whole population) will be distributed about the “real” result (ie the answer we’d get if we asked every single person) in a bell-shaped “normal distribution“. The maths of this “normal distribution” are very well understood and so we can make some well-grounded claims about the potential accuracy of our polls.

These include the fact that, above a basic minimum sample size, the margin of error in our poll (i.e., the error compared to other polls) varies by the inverse of the square root of the sample size. Or to be blunt about it, a poll with 2000 respondents is not twice as precise (i.e., with half the margin of error) as one with 1000, but merely 1.4 times more accurate, while the gap between 2000 and 500 is not a shrinkage in the margin of error by a factor of 4 but of 2 (you can tell straight away that the economics of large scale polling is a bit perverse – if you go from a 1000 to 10000 sample poll, your costs increase by a factor of 10, but the margin or error only shrinks by a factor of 3).

The “one-in-twenty will be rogue” rule comes from the fact that when we talk about the margin of error in a poll what we really mean is that in 95% of all polls the result will be in a band twice the size of the margin of error, centred on the result we have published. This 95% figure is the “confidence interval” (more precisely this is the band of two “standard errors” in each direction about the sample mean).

You may interject now and say “but that doesn’t mean a 1% difference is not real” and you would be right – if you are willing to live with a lower confidence interval or pay for a very much bigger sample. So, to make a 1% figure “real” we might be prepared to live with a margin of error of 0.5% on either side of the reported poll result. We could get that in two ways – shelling out to increase the sample size to roughly 40,000 (compared to the typical 1,000), which would keep our 95% confidence interval, or accepting that about 60% of polls would give a result that was not within +/- 0.5% of our figure – or, crudely, we were more likely to be wrong than to be right when we claimed the 1% was a “real” shift (we would have a 40% confidence interval).