Is dark matter locked up in primordial black holes?

Standard

Dark matter pie

Dark matter pie (Photo credit: Wikipedia)

To be honest, I have an issue with both “dark matter” and “dark energy” – they both look like close-to-metaphysical constructs to me: we have a hole where theory and observation do not match so we’ll invent this latter-day phlogiston and call it “dark”.

Then again, I’m not really qualified to comment and it is pretty clear that observations point to missing mass and energy.

I have another issue – if most of the mass of the universe is in the “dark matter” why is there no obvious evidence of it nearby? I don’t mean why can’t we see it – as obviously that is the point – but even though we sit in an area (Earth) of local gravitational field maximum we are struggling to see any local mass effects. (For instance this paper talks about how “local” conditions should impact any dark matter wind but, as far as I can see at least, it’s all entirely theoretical: we haven’t seen any dark matter wind at all.)

So the suggestion – reported in the New Scientist – and outlined in this paper – that actually dark matter is locked up in black holes caused by sound wave compression in the earliest moments of the universe has an appeal. It also potentially fits with the observational evidence that dark matter appears to be knocking around in the halos of galaxies.

These primordial black holes are not a new concept – Bernard Carr and Stephen Hawking wrote about them in 1974 (in this paper). The new evidence for their potential as stores of dark matter comes from the already famous Laser Interferometer Gravitational-Wave Observatory (LIGO) experiment – as that leaves open the prospect that the two black holes that generated the detected gravitational waves could both be in a galactic halo and of the right mass spectrum to suggest they and similar bodies accounted for the gravitational pull we see from dark matter.

All this is quite speculative – the paper points the way to a new generation of experiments rather than proclaims an epoch making discovery, but it’s obviously very interesting and also suggests that the long search for WIMPS – the hypothesised weakly interacting particles that have previously been the favourites as an explanation for dark matter – has essentially been in vain.

Primes are not random

Standard

An article in the New Scientist – here – discusses this paper on arxiv on the non-random distribution of the prime numbers.

So what? After all, we know that primes are not randomly distributed and become increasingly less common as we progress up the “number line” – with the Prime Number Theorem telling us that the number of primes up to or less than x, being \pi(x) \sim \frac{x}{ln(x)}.

(This is also takes us to perhaps the best known paradoxes of infinity – even though we know that primes become increasingly rare as we count higher, we also know there are an infinite number of primes: if we had a maximum prime P then if we multiplied all the primes up to an including P and added 1, we would have a number that was not divisible by any prime: a contradiction which demonstrates there can be no maximum prime and hence there must be an infinite number of primes.)

But the lack of randomness here is something deeper – there is a connection between successive primes, something rather more unexpected.

For any prime greater than 5 counted in base 10 we know it must end in either 1, 3, 7 or 9. The even endings (0, 2, 4, 6, 8) are all divisible by 2, while those that end in 5 are also divisible by it (this is what made learning the five times tables so easy for me in Mrs McManus’s P3 class forty-three years ago – the girls on the street loved a skipping rhyme and all I had to do was sing it in my head: five, ten, fifteen, twenty…).

The thinking was that these endings would be randomly distributed, but they are not (a simple result to demonstrate, so it is a sign that not all interesting maths research is yet beyond us mere mortals.)

To simplify things a bit, though, I am (as does the arxiv paper) going to look at the residues the primes leave in modulo 3 counting. Essentially this just means the remainder we get when divide the prime by three (there has to be a remainder, of course, because otherwise the number would not be prime). As is obvious these have to be 1s or 2s – e.g., 5 has a residue of 2, 7 a residue of 1 and so on.

What we are really interested in is the sequence of these residues. If successive primes were wholly independent then we would expect these sequences – i.e., (1, 1) or (1, 2) or (2, 1) or (2, 2) to all be equally likely. Using mathematical notation we describe the frequency of these sequences like this: \pi(x_0; 3, (1, 1)) for the (1, 1) sequence and so on.

For the first million primes we would expect each sequence to have 250,000 occurrences or something very close to that. But we don’t see that at all:

\pi(x_0; 3, (1,1)) = 215,837
\pi(x_0; 3, (1,2)) = 283,957
\pi(x_0; 3, (2,1)) = 283,957
\pi(x_0; 3, (2,2)) = 216,213

Using a list of primes I found online and some C++ and R (listed at the end if you want to mess about with it yourself), I decided to using a little bit of Bayesian inference to see how the non-random the primes are.

I treated the sequences as though they were coin tosses – if we get a switch (1, 2) or (2, 1) then that is the equivalent of a head, while if we get a recurrence (1, 1) or (2, 2) then we have had the equivalent of flipping a tail. If the coin was “fair” then for a 1000 flips we’d get 500 of each type. The result for the first 1000000 primes, though, suggests we’d actually get 568 heads.

(I discussed Bayesian testing for coin fairness some time ago – here.)

With Bayesian inference we can posit a “prior” –   a guess, educated or otherwise – as to how fair the coin we are testing might be. If we know nothing we might start with a prior that assigned equal likelihoods to all results, or we might, in this case, go for a (in retrospect naive) view that the most likely outcome is fairness and that subsequent primes are independent of those that went before.

The first graph – below – is based on saying we are ignorant of the likely result and so assigns an equal likelihood to each result. (A probability of 1 indicates absolute certainty that this is the only possible outcome, while a probability of 0 indicates there is literally zero chance of this happening – it should be noted that using Bayesian inference means we are looking at a prediction for all primes, not simply the results of the first 1000 we test.)

First thousand primes

This looks at the first thousand primes and predicts we’d get something like 650 – 660 heads (if you look closely you’ll see that even as the probability density graph becomes narrower it also moves slowly to the left.)

The second graph does the same but posits a different prior – one where the initial guess is that, quite strongly, the “coin” is a fair one. That makes a big difference for the first few primes but it does not take long before the two graphs look exactly the same.

with strongly fair prior

So does this lack of randomness indicate a shocking and previously undetected “hidden hand” in number theory? It seems not (I don’t claim to follow the maths of the paper at this point): the authors explain can explain the pattern using what we already know about primes and also point out that, as we further extend our examination we find the result is increasingly “fair” – in other words, the probability density spike would creep ever closer to 500 in the graphs above. (This also shows that even quite large sample sizes are relatively poor guides to the final result if the sampling is not truly random, but that is another matter entirely).

This is a reworked and consolidated post from what were the two previous posts on this subject.

Here’s the code

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <vector>
#include <cmath>

using namespace std;

void normalise(vector<long double>& input)
{
	long double sum = 0;		
	for (auto x: input) {
		sum += abs(x/sizeof(input));
	}
	long double factor = 1/sum;
	for (auto& x: input) {
		x = x * factor;
	}
	auto biggest = max_element(input.begin(), input.end());
	if (*biggest > 1) {
		factor = 1/(*biggest);	
		for (auto& x: input) {
			x = x * factor;
		}
	}
}

int main()
{
	vector<int> primes;
	//read in the primes file
	ifstream inputPrimes("/Users/adrian/Downloads/P-1000000.txt");
	//get the primes
	string rawPrimeLine;
	while (getline(inputPrimes, rawPrimeLine)) {
		string ordinal;
		string prime;
		istringstream stringy(rawPrimeLine);
		getline(stringy, ordinal, ',');
		getline(stringy, prime, ',');
		primes.push_back(atol(prime.c_str()));
	}

	//calculate bayes inference
	ofstream bayesOut("/Users/adrian/bayesout.csv");
	vector<long double> distribution;
	for (int i = 0; i <= 1000; i++) { 
                distribution.push_back(0.0001 + pow(500 - abs(i - 500), 4));
                // distribution.push_back(0.001);
        } 
        normalise(distribution);
        int last = 1; //7%3 = 1 
        int count = 2;
        int heads = 1;
        //heads = switch
        //tails = no switch
        //0 = all tails, 1 = all heads 
        for (auto p: primes) {
                if (p > 7 && count < 1010) {
                        count++;
                        int residue = p%3;
                        long double hVal = 0.0;
                        if (residue != last) {
                                heads++;
                        }
                        for (auto& x: distribution) {
                                x *= pow(hVal, heads) * pow(1 - hVal, count - heads);
                                hVal += 0.001;
                        }
                        normalise(distribution);
                        last = residue;
                        int numb = 0;
                        for (auto x: distribution) {
                               if (numb++ > 0) {
					bayesOut << ",";
				}
				bayesOut << x;
			}
			bayesOut << endl;
		}
	}
}
#!/usr/bin/env Rscript
library("animation")

w2<-read.csv("/Users/adrian/bayesout.csv")
saveGIF({
optos = ani.options(nmax=1000)
for (i in 1:ani.options("nmax")) {
	matplot(t(w2)[,i], ylim=c(0, 1), axes=FALSE, ylab="Probability", xlab=c("Outcome: ", i), type="l")
	axis(side = 1, at = c(0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000))
	axis(side = 2, at = c(0, 1))
	ani.pause()
}}, interval=0.15, movie.name="primes.gif", ani.width=400, ani.height=400)

The opportunities of AI

Standard

A computer program – AlphaGo – has now beaten the world’s greatest Go player and another threshold for AI has been passed.

As a result a lot of media commentary is focusing on threats – threats to jobs, threats to human security and so on.

But what about the opportunities? Actually, eliminating the necessity of human labour is a good thing – if we ensure the benefits of that are fairly distributed.

The issue is not artificial intelligence – a computer is just another machine and there is nothing magic about “intelligence” in any case (I very much follow Alan Turing in this). The issue is how humans organise their societies.

Russia Today (@RT_com) broadcasts fiction, not news

Standard

Last week’s New Scientist reports that Russia Today – the Kremlin’s propaganda channel subsidised to broadcast lies in support of the Russian Federation’s hostility to any country in Russia’s “near abroad” that dares to travel down the path of democracy and the rule of law – went one further when it started churning out stories about how the Zika outbreak was a result of a failed science experiment.

The basis of their report was “the British dystopian TV series Utopia“. Yes, they broadcast fiction as news and for once it was not a question of interpretation.

Here’s the product description from Amazon:

The Utopia Experiments is a legendary graphic novel shrouded in mystery. But when a small group of previously unconnected people find themselves in possession of an original manuscript, their lives suddenly and brutally implode.

Targeted swiftly and relentlessly by a murderous organisation known as The Network, the terrified gang are left with only one option if they want to survive: they have to run. But just as they think their ordeal is over, their fragile normality comes crashing down once again.

The Network, far from being finished, are setting their destructive plans into motion. The gang now face a race against time, to prevent global annihilation.

Wormholes and quantum entanglement

Standard

Been a while…

There is a fascinating article in this week’s New Scientist about the idea that quantum mechanics and general relatively could be linked via the idea of the “wormhole” – a fold in spacetime that links what appears to be two very distant parts of the universe.

The article – as is generally the case in a popular science magazine – is more hand wavy than scientific, but the concepts involved don’t seem to be difficult to grasp and they might answer some of the more mysterious aspects of quantum mechanics – especially the problem “spooky action at a distance“: quantum entanglement.

Quantum entanglement is the seeming evidence that two particles separated by distance appear to exchange information instantaneously: for when one particle changes state (or rather when its state is observed), the other does too. The suggestion is that, actually, these two particles are not separated by distance by are linked by a wormhole.

Sounds like a piece of Hollywood science, for sure, but it is an idea based on our understanding of black holes – a prediction of Einstein’s general relativity that we have lots of (indirect) evidence for: these would seem to be surrounded by entangled particles – the so-called quantum firewall.

So, dear reader, I wrote a Gauss-Jordan solver

Standard

Number of steps in the Euclidean algorithm for...

Number of steps in the Euclidean algorithm for GCD(x,y). Red points indicate relatively few steps (quick), whereas yellow, green and blue points indicate successively more steps (slow). The largest blue area follows the line y = Φx, where Φ represents the Golden ratio. (Photo credit: Wikipedia)

Been a while…

…Anyway: I am now designing (PhD related) a simulation of a NoC system and I need to have it run some code – so I thought I’d look at code to solve a system of linear equations and sought to build the same using Gauss-Jordan elimination.

It proved to be a lot more difficult than I had hoped: not that the maths was too complex, but that producing something I could later simplify from C++ into something that looked like synthetic assembly has and is proving very difficult indeed.

My system needs to execute a simple synthetic assembly because what I am modelling is the way the memory interactions work – so I have to get close to the metal and not hide behind a high-level language (no, not even C will do). Obviously – or at least it feels like that to me – I don’t want to have to mess about with floating point implementations, so I wanted integer arithmetic.

And that is where the first problem comes – to implement a Gauss-Jordan solver (or even to solve linear equations in the way you used to at school: the Gauss-Jordan algorithm is merely a scaling up of that), you have to divide numbers and sadly (for this purpose) not all numbers are multiples of one another – so the only way to preserve integer maths is to use rationals.

That, of course, is not such a problem. The C++ pair utility even makes it easy to model a rational out of two integers (writing a separate rational class would be overkill for something as small as this.)

But then, quickly, another problem becomes apparent. Even when using long long types to represent the numerator and denominator of our rational fraction repeated multiplications to enable fractional subtraction breaks the integer bounds after little more than a handful of row operations (and I want to model a large system – with literally thousands of equations in each system).

Even when adding a repeatedly applied gcd function to apply Euclid’s algorithm to the rational fractions, the numbers go out of range quickly.

So I have been forced to rely on an arbitrary precision library – GNU’s GMP. The code now works well (you can see at its Github repro and the solver code is below:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
#include <utility>
#include <gmpxx.h>



//Gauss-Jordan elimination

using namespace std;

void gcd(pair<mpz_class, mpz_class>& input)
{
	mpz_class first = abs(input.first);
	mpz_class second = abs(input.second);
	if (first > second) {
		first = second;
		second = abs(input.first);
	}
	while (second != 0) {
		mpz_class temp = second;
		second = first%second;
		first = temp;
	}
	input.first /= first;
	input.second /= first;
	if (input.second < 0) {
		input.first *= -1;
		input.second *= -1;
	}
}

int main()
{
	string path("./variables.csv");
	ifstream inputFile(path);

	vector<mpz_class> answers;
	//answer line
	string rawAnswer;
	getline(inputFile, rawAnswer);
	istringstream stringy(rawAnswer);
	string number;
	while(getline(stringy, number, ',')) {
		answers.push_back(atol(number.c_str()));
	}

	//now read in the system
	vector<vector<pair<mpz_class, mpz_class> > > lines;
	while(getline(inputFile, rawAnswer)) {
		istringstream stringy(rawAnswer);
		vector<pair<mpz_class, mpz_class> > innerLine;
		while (getline(stringy, number, ',')) {
			pair<mpz_class, mpz_class>
				addPair(atol(number.c_str()), 1);
			innerLine.push_back(addPair);
		}
		lines.push_back(innerLine);
	}

	for (int i = 0; i < lines.size(); i++) {
		pair<mpz_class, mpz_class> pivot(lines[i][i].second,
			lines[i][i].first);
		if (lines[i][i].first != 0) {
			lines[i][i].first = 1;
			lines[i][i].second = 1;
		} else {
			continue;
		}
		for (int j = i + 1; j <= lines.size(); j++) {
			lines[i][j].first *= pivot.first;
			lines[i][j].second *= pivot.second;
			gcd(lines[i][j]);
		}
		for (int j = i + 1; j < lines.size(); j++) {
			pair<mpz_class, mpz_class> multiple(lines[j][i].first,
				lines[j][i].second);	
			lines[j][i] = pair<mpz_class, mpz_class>(0, 1);
			for (int k = i + 1; k <= lines.size(); k++) {
				pair<mpz_class, mpz_class>
					factor(
					multiple.first * lines[i][k].first,
					multiple.second * lines[i][k].second);
				gcd(factor);
				lines[j][k] = pair<mpz_class, mpz_class>
					(lines[j][k].first * factor.second -
					factor.first * lines[j][k].second,
					lines[j][k].second * factor.second);
				gcd(lines[j][k]);
			}
			
		}
	}
	// now eliminate upper half
	for (int i = lines.size() - 1; i > 0; i--) {
		if (lines[i][i].first == 0) {
			continue;
		}
		pair<mpz_class, mpz_class> answer = lines[i][lines.size()];
		for (int j = i - 1; j >= 0; j--) {
			pair<mpz_class, mpz_class> multiple = lines[j][i];
			if (multiple.first == 0) {
				continue;
			}
			lines[j][i] = pair<mpz_class, mpz_class>(0, 1);
			lines[j][lines.size()].first =
				lines[j][lines.size()].first * multiple.second
				- answer.first * multiple.first * 
				lines[j][lines.size()].second;
			lines[j][lines.size()].second =
				lines[j][lines.size()].second *
				answer.second * multiple.second;
			gcd(lines[j][lines.size()]);
		}
	}
	cout << "DIAGONAL FORM" << endl;
	for (int i = 0; i < lines.size(); i++) {
		for (int j = 0; j < lines.size(); j++) {
			if (lines[i][j].first == 0) {
				cout << "0 , ";
			} else {
				cout << lines[i][j].first << "/" << lines[i][j].second << " , ";
			}
		}
		cout << " == " << lines[i][lines.size()].first << " / " << lines[i][lines.size()].second << endl;
	}

}

(NB: I know that GMP supports rationals “out of the box” but I am trying to model something that I can translate to my synthetic assembly and so hiding the details of the implementation is not what I want to do.)

How on Earth, though, am I going to model arbitrary precision code in my “assembly” code?

Running a half marathon

Standard

A year ago today I ran the Hackney Half Marathon – my first race at that distance (actually only my second true race at any distance). I was fit – a week before I’d done the Finsbury Park Parkrun in 23 minutes and 17 seconds, a PB I am yet to beat. I felt great at the start and ran fast – too fast, as I even knew at the time but couldn’t get myself to slow down properly. I ran the first 10km in what was, until a month ago, my PB. If I had kept that up I’d have finished somewhere at around 1 hour 50 minutes.

By 15km I was slowly badly, by 17km I was desperate. By 19km I could run no more and was walking. I did run most of the last 1000 metres and I certainly ran over the line, but I was in a terrible state and nearly fainted. The finishing time – 2 hours 15 minutes – was a real disappointment, but at least I had done it. But never again, surely.

My second run at this distance was the Royal Parks Half Marathon last October. For the first 10km I followed the 1 hour 55 minutes pacer but after that I couldn’t keep up – I had not prepared as well for this race as the Hackney half and that fundamental lack of fitness had let me down, but still I wasn’t doing too badly.

Coming into the final mile both my legs buckled. I knew I had to walk. After a few hundred metres I tried running again only to get a very painful attack of cramp. I walked to about the 800 metres-to-go mark and started running again, slowly. I made it over the line. But whereas I’d got to 20k in 2 hours and a minute, it was 2 hours and 12 minutes before I finished.

And now I had really injured myself quite badly. Not badly as in get to hospital but badly as in blisters on both feet (don’t rely on Nike’s running socks), bad chafing – something like this – fixes that and most seriously of all, I had very painful Achilles’s Tendons. I didn’t run again at all for two weeks and, effectively, my 2014 running season was over.

Roll around 2015 and two big pieces of technology come into my life. Firstly the Garmin Forerunner 10 – a simple but very easy to use runners’ watch which meant I could really judge my pace properly and then, perhaps even more importantly, a Karrimor Roller which has worked wonders on my legs and hence my Achilles’s Tendons.

So, last week I ran the St. Albans Half Marathon. I had a realistic target – a 5′ 50″ per kilometre pace – and a means to judge whether I was hitting it or not. That wouldn’t take me under two hours, but it would take me close and it was realistic and achievable on what was a very tough course. I prepared properly – tapering even when I wanted to run. And I did it: 2 hours, 3 minutes and 34 seconds – a 5′ 50″ pace.

I still made mistakes – too fast (about 5′ 40″ pace) for much of the start and running the end in a semi-zombified state due to, fundamentally, mental weakness. But it was good.

Even better – I’ve run 30km in the last week – so no injuries.