Ten Great Ideas About Chance: A Review


Unfortunately Ten Great Ideas About Chance is a disappointment.

The central idea of the book is to look at ten key mathematical-philosophical ideas in probability and, using the history of the idea, explain what they are about and why they matter.

It’s not that the book doesn’t have some very interesting material, but it fails to hit its target over and over again and, unfortunately, even contains some obvious and – presumably – not so obvious errors.

This review states it so much better than I can, so here is an extract:

The chapters are invariably a mix of
1. a trivial example that does not penetrate enough the intended topic because it contains too much of the familiar and too little of the topic that’s being introduced
2. references to original texts that are impenetrable nineteenth century translations into English from eighteenth century originals written in French or German or Latin
3. statements of complex results that would take fifty pages to arrive at if the proofs were shown
4. cheerleading

So what I re-lived by reading this book is my Freshman Year nightmare math class where three times a week I’d follow the first five minutes of the lecture only to subsequently find myself furiously copying from the board so I can read my lecture notes later at home and try to make sense of them.

What’s the evidence your numbers are faked?


Currently reading: Ten Great Ideas About Chance.

It can be a bit of a frustrating read – a lot of the chapters about gambling and judgement appear to be poorly explained to me – especially if you’ve ever actually placed a bet on anything (I am far from a regular gambler – I’m not sure I’ve placed a bet on anything in the last decade but I still know how it works).

But it also has lots of exceptionally interesting stuff: not least the revelation that naturally occurring number sets tend to begin with 1 with a proportion of 0.3 whilst fraudsters (including election fixers and presumably scientific experiment fixers) tend to use such numbers 0.111… (1/9) of the time.

So, taking my life in my hands – what about results in my PhD – (read thesis here). I am actually calculating this as I go along – so I really do hope it doesn’t suggest fraud!

Anyway – first table, which has a calculated SPECmark figure based on other researchers’ work – so not really my findings, though the calculations are mine.

Numbers are: 35, 22.76, 13.41, 7.26, 3.77, 1.94, 1.01, 0.55, 0.31, 0.20 and 0.14

And (phew) that means 4 out of 11 (0.364) begin with 1 and so looks fraud free on this test.

But as I say, those are my calculations, but they are not my findings at root.

So moving on to some real findings.

First, the maximum total length of time it takes for a simulation to complete a set of benchmarks in simulated cycles: 43361267, 1428821, 1400909, 5760495, 11037274, 2072418, 145291356, 5012280.

Here a whole half are numbers that begin with 1. Hmmmm.

The mean time for the same: 42368816, 1416176, 1388897, 5646235, 10964717, 2026200, 143276995, 4733750.

Again, half begin with one.

What about the numbers within the results – in this case the amount of cycles lost due to waiting on other requests?

The numbers are: 35921194, 870627, 844281, 4364623, 1088954, 1446305, 110996151, 3420685

Now the proportion has fallen to 3/8 (0.375) and I can feel a bit more comfortable (not that the other results suggested I was following the fraudsters’ one-ninth pattern in any case.)

Later I produce numbers for an optimised system. How do they perform?

The maxima now become: 2514678, 1357224, 1316858, 3840749, 10929818, 1528350, 102077157, 3202193.

So the proportion beginning with 1 has actually risen to 5/8.

And the means show a similar pattern. Worse news with the blocks though. They now become: 730514, 775433, 735726, 2524815, 806768, 952774, 64982307, 1775537. So I am left with 1/8 (0.125) starting with 1 – something dangerously close to 1/9.

Can I save myself? I hope so … the figures above are for the first iteration of the system, but when we look at subsequent iterations a different pattern (for the blocks) emerges: 130736, 0, 0, 1612131, 97209, 232131, 64450433, 1117599.

This is now back to 3/8.

Of course, I didn’t cheat and I also suspect (I would, wouldn’t I?) that the block count is a better measure of whether I had or not – because the overall execution time of the benchmarks is, in some sense, a function of how long their execution path is and effectively that is determined – the blocking in the system is an emergent property and if I faked the whole thing I’d have less to go on and be much more likely to make it up.

Well, that’s my story and I’m sticking to it.



Puzzle about an M/G/1 queue


I am deeply puzzled by a question about the behaviour of an M/G/1 queue – i.e., a queue with a Markovian distribution of arrival times, a General distribution of service times and 1 server. I have asked about this on the Math Stackexchange (and there’s now a bounty on the question if you’d like to answer it there – but as I am getting nowhere with it, I thought I’d ask it here too.

(This is related to getting a more rigorous presentation on thrashing into my PhD thesis.)

Considering an M/G/1 queue with Poisson arrivals of rate λ – this comes from Cox and Miller’s (1965) “The Theory of Stochastic Processes” (pp 240 – 241) and also Cox and Isham’s 1986 paper “The Virtual Waiting-Time and Related Processes“.

My question is what is the difference between (using the authors’ notation) p_0 and p(0,t)? The context is explained below…

In the 1965 book (the 1986 paper presents the differentials of the same equations), X(t) is the “virtual waiting time” of a process and the book writes of “a discrete probability p_0(t) that X(t)=0, i.e., that the system is empty, and a density p(x,t) for X(t)>0“.

The system consumes virtual waiting time in unit time, i.e., if X(t)\leq\Delta t and there are no arrivals in time \Delta t then X(t + \Delta t) = 0.

The distribution function of X(t) is then given by:
F(x,t)=p_0(t)+\int_{0}^{\infty}p(z,t)dz

They then state:
p(x, t+ \Delta t) = p(x + \Delta t, t)(1 - \lambda \Delta t) +p_0(t)b(x)\lambda\Delta t + \int_{0}^{x}p(x - y, t)b(y)dy\lambda\Delta t + o(\Delta t)

I get all this – the first term on the RHS is a run-down of X(t)>0 with no arrivals, the second is adding b(x) of service time when the system is empty at t and the third, convolution-like, term is adding b(y) of service time from an arrival when it’s not empty at t. (The fourth accounts for their being more than one arrival in \Delta t but it tends to zero much faster than \Delta t so drops out as \Delta t approaches the limit.)

And … and this is where I have the problems …

p_0(t+\Delta t)=p_0(t)(1-\lambda\Delta t) +p(0,t)\Delta t(1 - \lambda\Delta t) + o(\Delta t)

The first term of the RHS seems clear – the probability that the system is empty at t multiplied by the probability there will be no arrivals in \Delta t, but the second is not clear to me at all.

I assume this term accounts for the probability of the system “emptying” during \Delta t but I don’t see how that works, is anyone able to explain?

In other words, how does p(0,t)\Delta t(1 - \lambda\Delta t) represent this draining? Presumably (1 - \lambda\Delta t) again represents the possibility of zero arrivals in \Delta t, so how does p(0, t)\Delta t represent the X(t) \leq \Delta t situation?

If we take the equilibrium situation where p_0(t) = p_0 and p(x, t) = p(x) then, if we differentiate and as p^{\prime}_0 = 0, we get p_0 = \lambda p(0) – so, again, what does p(0) represent?

Learnt this week… 24 January


My friend and former colleague Adam Higgitt every Friday posts a list of “five things I have learned this week”. It’s popular and good fun – especially as Adam is not afraid of an argument if you challenge some of his claims.

For a while I tried to do the same thing myself, but failed miserably.

I am not going to try again, but I am proposing to try something different, if inspired by Adam.

So here is the first list of things “learnt this week” scientific or mathematical facts and amusements. I will aim for five, but this week just did not make it.

1. A random walk can be used to build a binomial distribution – but not a very good one!

Imagine a left-right ruled line centred on zero and a marker than can, in every time step move either left or right be one step where the probability of moving left p_l and of moving right, p_r are both the same: i.e., p_l = p_r = 0.5 . At the “beginning of time” the marker stands at 0.

Then if we count the times the marker is at any given position they will be distributed bionomially (well, as we approach an infinite time). The BASIC code below (which I wrote using BINSIC) should give you an idea (this code runs the risk of an overflow though, of course and the most interesting thing about it is how unlike a binomial distribution the results can be).


10 DIM A(1001)
12 FOR I = 1 TO 1001
14 LET A(I) = 0
16 NEXT I
20 LET POS = 500
30 FOR I = 1 TO 50000
40 LET X = RND * 2
50 IF X > 1 THEN LET POS = POS + 1 ELSE LET POS = POS - 1
60 LET A(POS) = A(POS) + 1
70 NEXT I
80 PRINT "*****BINOMIAL DISTRIBUTION*****"
90 FOR I = 1 TO 1001
95 LET X = I - 500
110 PRINT X," ",A(I)
120 NEXT I

Here’s a chart of the values generated by similar code (actually run for about 70,000 times):
Not much like a binomial distribution2. Things that are isomorphic have a one-to-one relationship

Up to this point I just had an informal “things that look different but are related through a reversible transformation” idea in my head. But that’s not fully correct.

A simple example might be the logarithms. Every real number has a unique logarithm.

Enhanced by Zemanta

Some questions about the science of magic chocolate


This image was selected as a picture of the we...
 (Photo credit: Wikipedia)

I have to be careful here, as it’s not unknown for bloggers to be sued in the English courts for the things they write about science. So I will begin by saying I am not, and have no intention of, casting aspersions on the integrity of any of the authors of the paper I am about to discuss. Indeed, my main aim is to ask a few questions.

The paper is “Effects of Intentionally Enhanced Chocolate on Mood“, published in 2007 in issue 5 of volume 3 of “Explore: The Journal of Science and Healing” by Dean Radin and Gail Hayssen, both of the Institute of Noetic Sciences in California, and James Walsh of Hawaiian Vintage Chocolate.

The reason it came to my attention today is because it was mentioned in the “Feedback” diary column of the current issue of the New Scientist:

the authors insist that in “future efforts to replicate this finding… persons holding explicitly negative expectations should not be allowed to participate for the same reason that dirty test tubes are not allowed in biology experiments”. [Correspondent] asks whether this may be “the most comprehensive pre-emptive strike ever” against any attempt to replicate the results.

But I want to ask a few questions about the findings of the report which are, in summary, that casting a spell over chocolate makes it a more effective mood improver.

In their introduction to the paper the authors state:

Cumulatively, the empirical evidence supports the plausibility that MMI [mind-matter interaction] phenomena do exist.

Unfortunately, the source quoted for this is a book –Entangled Minds – so I cannot check if this is based on peer reviewed science. But you can read this review (as well as those on Amazon) – and make your own mind up.

Again, not doubting their sincerity, I do have to question their understanding of physics when they state:

Similarities between ancient beliefs about contact magic and the modern phenomenon of quantum entanglement raise the possibility that, like other ethnohistorical medical therapies once dismissed as superstition – eg, the use of leeches and maggots in medicine – some practices such as blessing food may reflect more than magical thinking or an expression of gratitude.

The study measured the mood of the eaters of chocolate over a week. Three groups ate chocolate “blessed” in various ways and one ate unblessed chocolate.

The first thing that is not clear (at least to me) is the size of each group. The experiment is described as having been designed for 60 participants, but then states that 75 signed informed consents before reporting that 62 “completed all phases of the study”. Does that mean that 13 dropped out during it? As readers of Bad Pharma will know it is an error to simply ignore drop outs (if they are there – as I say it is not clear.)

The researchers base their conclusion that –

This experiment supports the ethnohistorical lore suggesting that the act of blessing food, with good intentions, may go beyond mere superstitious ritual – it may also have measurable consequences

– substantially on the changes in mood on one day – day 5 of the 7.

The researchers say that the p-value for their finding on that day is 0.0001 – ie there is a 1 in 10000 chance this is the result of chance alone.

I have to say I just not convinced (not by their statistics which I am sure are sound) but by the argument. Too small a sample, too short a period, too many variables being measured (ie days, different groups), a lack of clarity about participation and so on. But I would really appreciate it if someone who had a stronger background in statistics than me had a look.

I love R already, but I have not got a clue how to use it


Logo for R
Image via Wikipedia

Thanks to Professor Paul A. Rubin, I have discovered GNU R and already I love it and think it is just the tool I need to plot a few of the graphs (of turnaround times for tasks) in my MSc project.

But there is one problem – I don’t really know how to use it and the online notes do not seem to me to be written in a particularly helpful, tutorial-like, way. All I want to do is plot some data, and do not really want to do it interactively either.

Still, I guess it will come with time. I have ordered this – R in a Nutshell – but I am not sure that is going to do the job, either.

Try and fail, try again and fail better.