Strange reviews on Amazon


Messing about with convolutional neural networks (CNNs) continues to take up some of my time (in case my supervisor reads this – I also have a simulation of a many core system running on the university’s computer atm).

I started my research here with a much cited, but really well-out-0f-date book – Practical Neural Network Recipes in C++. What’s nice about that book is that it is orientated towards getting things done, though the C++ is really from the “C with classes” era.

Another book – Guide to Convolutional Neural Networks: A Practical Application to Traffic-Sign Detection and Classification – which I can access through the University of York’s library, helped fill in some of the theoretical and other gaps and also showed me why I needed to move away from the perceptron model promoted in the earlier book and move towards a CNN. But like many of Springer’s books it is poorly edited and not fully and properly translated to use online.

So I’m still on the look out for the perfect match – a book with practical coding examples that clearly explains the theory and, bluntly, is written in good English with all the maths actually reproduced in the online format (as I am just not going to be able to afford a printed copy.)

In particular I want a clear explanation of how to do back propagation in a CNN – as it’s plain that the general method outlined in “Practical Neural Network Recipes” doesn’t work beyond a fully connected layer, while the explanation in “Guide to…” is impenetrable and, actually, rather odd (as it seems to imply that we use a fixed weight for every neuron in a filter as opposed to using fixed weights across each filter – if I have explained that properly).

So, I’ve just had another look and came across this book … “Convolutional Neural Networks in Python: Introduction to Convolutional Neural Networks

This book has managed (at time of writing to have collected two, one star, reviews on the Amazon UK website):

Amazon reviews

I have no idea how fair those reviews are, but this passage from the preview version available on the website doesn’t suggest the author is yet rivalling Alan Turing:

But, here’s the odd thing. It would appear a number of “purchasers” of this book through Amazon.com are very enthusiastic about it and all felt the need to say so this very day (9 August):

odd reviews

Even more oddly, the reviews all read like spam comments I get on this blog. But I have no evidence to suggest these are anything other than genuine, if odd, comments on the book…

Free software to chop up your JPEGs


As a public service announcement – if you need some software (on Linux but may well compile and run on other systems if they support Qt) to chop a big JPEG up into smaller files, I have written this (in my case to support building a test set for a neural network).

It’s at https://github.com/mcmenaminadrian/TestSetCreator and it’s GPL licenced (v 3).

Back to neural networks


Neural networks have fascinated me for a long time, though I’ve never had much luck applying them.

Back in the early and mid 1990s the UK’s trade and industry department ran a public promotional programme to industry about NNs and I signed up, I even bought a book about how to build them in C++ (and I read it, though I have to confess my understanding was partial).

My dream then was to apply the idea to politics: as a more effective way of concentrating resources on key voters. My insight was that, when out doing what political parties then called “canvassing” and now – largely for less than fully honest legal reasons as far as I can see – call “voter ID” (in electoral law “canvassing” i.e., seeking to persuade someone to vote one way is more highly regulated than simply “identifying” how they intend to vote) you could quite often tell how someone would respond to you even before they opened the door. There was some quality that told you, you were about to knock on a Labour voter’s door.

I still don’t know what that quality is – and after the recent election I’m not sure the insight applies any more any way – but the point was that if you could take a mass of data and have the NN find the function for you, then you could improve your chances of getting to your potential support and so on…

But I never wrote any code and NNs seemed to go out of fashion.

Now, renamed “machine learning”, they are back in fashion in a big way and my interest has been revived. But I am not trying to write any code that will work for politics.

Instead I am exploring whether I can write any code that will look at a music score and play it back. (This probably a silly idea as a start NN project, as music is not easy to decipher in any way as far as I can see).

I have read another book on C++ and neural networks (and even understood it): an ancient tome from the previous time NNs were in fashion.

The first task is to actually identify which bits of scribbling on the page are musical notation at all. And I have written some code to build a training set for that – here. It might well be of use to you if you need to mark bits of a JPEG as “good” or “bad” for some other purpose, so please do get in touch if you need some help with that.

So, dear reader, I wrote a Gauss-Jordan solver


Number of steps in the Euclidean algorithm for...
Number of steps in the Euclidean algorithm for GCD(x,y). Red points indicate relatively few steps (quick), whereas yellow, green and blue points indicate successively more steps (slow). The largest blue area follows the line y = Φx, where Φ represents the Golden ratio. (Photo credit: Wikipedia)

Been a while…

…Anyway: I am now designing (PhD related) a simulation of a NoC system and I need to have it run some code – so I thought I’d look at code to solve a system of linear equations and sought to build the same using Gauss-Jordan elimination.

It proved to be a lot more difficult than I had hoped: not that the maths was too complex, but that producing something I could later simplify from C++ into something that looked like synthetic assembly has and is proving very difficult indeed.

My system needs to execute a simple synthetic assembly because what I am modelling is the way the memory interactions work – so I have to get close to the metal and not hide behind a high-level language (no, not even C will do). Obviously – or at least it feels like that to me – I don’t want to have to mess about with floating point implementations, so I wanted integer arithmetic.

And that is where the first problem comes – to implement a Gauss-Jordan solver (or even to solve linear equations in the way you used to at school: the Gauss-Jordan algorithm is merely a scaling up of that), you have to divide numbers and sadly (for this purpose) not all numbers are multiples of one another – so the only way to preserve integer maths is to use rationals.

That, of course, is not such a problem. The C++ pair utility even makes it easy to model a rational out of two integers (writing a separate rational class would be overkill for something as small as this.)

But then, quickly, another problem becomes apparent. Even when using long long types to represent the numerator and denominator of our rational fraction repeated multiplications to enable fractional subtraction breaks the integer bounds after little more than a handful of row operations (and I want to model a large system – with literally thousands of equations in each system).

Even when adding a repeatedly applied gcd function to apply Euclid’s algorithm to the rational fractions, the numbers go out of range quickly.

So I have been forced to rely on an arbitrary precision library – GNU’s GMP. The code now works well (you can see at its Github repro and the solver code is below:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
#include <utility>
#include <gmpxx.h>



//Gauss-Jordan elimination

using namespace std;

void gcd(pair<mpz_class, mpz_class>& input)
{
	mpz_class first = abs(input.first);
	mpz_class second = abs(input.second);
	if (first > second) {
		first = second;
		second = abs(input.first);
	}
	while (second != 0) {
		mpz_class temp = second;
		second = first%second;
		first = temp;
	}
	input.first /= first;
	input.second /= first;
	if (input.second < 0) {
		input.first *= -1;
		input.second *= -1;
	}
}

int main()
{
	string path("./variables.csv");
	ifstream inputFile(path);

	vector<mpz_class> answers;
	//answer line
	string rawAnswer;
	getline(inputFile, rawAnswer);
	istringstream stringy(rawAnswer);
	string number;
	while(getline(stringy, number, ',')) {
		answers.push_back(atol(number.c_str()));
	}

	//now read in the system
	vector<vector<pair<mpz_class, mpz_class> > > lines;
	while(getline(inputFile, rawAnswer)) {
		istringstream stringy(rawAnswer);
		vector<pair<mpz_class, mpz_class> > innerLine;
		while (getline(stringy, number, ',')) {
			pair<mpz_class, mpz_class>
				addPair(atol(number.c_str()), 1);
			innerLine.push_back(addPair);
		}
		lines.push_back(innerLine);
	}

	for (int i = 0; i < lines.size(); i++) {
		pair<mpz_class, mpz_class> pivot(lines[i][i].second,
			lines[i][i].first);
		if (lines[i][i].first != 0) {
			lines[i][i].first = 1;
			lines[i][i].second = 1;
		} else {
			continue;
		}
		for (int j = i + 1; j <= lines.size(); j++) {
			lines[i][j].first *= pivot.first;
			lines[i][j].second *= pivot.second;
			gcd(lines[i][j]);
		}
		for (int j = i + 1; j < lines.size(); j++) {
			pair<mpz_class, mpz_class> multiple(lines[j][i].first,
				lines[j][i].second);	
			lines[j][i] = pair<mpz_class, mpz_class>(0, 1);
			for (int k = i + 1; k <= lines.size(); k++) {
				pair<mpz_class, mpz_class>
					factor(
					multiple.first * lines[i][k].first,
					multiple.second * lines[i][k].second);
				gcd(factor);
				lines[j][k] = pair<mpz_class, mpz_class>
					(lines[j][k].first * factor.second -
					factor.first * lines[j][k].second,
					lines[j][k].second * factor.second);
				gcd(lines[j][k]);
			}
			
		}
	}
	// now eliminate upper half
	for (int i = lines.size() - 1; i > 0; i--) {
		if (lines[i][i].first == 0) {
			continue;
		}
		pair<mpz_class, mpz_class> answer = lines[i][lines.size()];
		for (int j = i - 1; j >= 0; j--) {
			pair<mpz_class, mpz_class> multiple = lines[j][i];
			if (multiple.first == 0) {
				continue;
			}
			lines[j][i] = pair<mpz_class, mpz_class>(0, 1);
			lines[j][lines.size()].first =
				lines[j][lines.size()].first * multiple.second
				- answer.first * multiple.first * 
				lines[j][lines.size()].second;
			lines[j][lines.size()].second =
				lines[j][lines.size()].second *
				answer.second * multiple.second;
			gcd(lines[j][lines.size()]);
		}
	}
	cout << "DIAGONAL FORM" << endl;
	for (int i = 0; i < lines.size(); i++) {
		for (int j = 0; j < lines.size(); j++) {
			if (lines[i][j].first == 0) {
				cout << "0 , ";
			} else {
				cout << lines[i][j].first << "/" << lines[i][j].second << " , ";
			}
		}
		cout << " == " << lines[i][lines.size()].first << " / " << lines[i][lines.size()].second << endl;
	}

}

(NB: I know that GMP supports rationals “out of the box” but I am trying to model something that I can translate to my synthetic assembly and so hiding the details of the implementation is not what I want to do.)

How on Earth, though, am I going to model arbitrary precision code in my “assembly” code?

Struggling


Die of an Intel 80486DX2 microprocessor (actua...
Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75 mm) in its packaging. (Photo credit: Wikipedia)

Been a while since I’ve written here – been avoiding writing about politics, which has obviously not been so great for me in the last couple of weeks… but now I have something else to ruminate on.

I have reached a milestone, or perhaps basecamp, in my PhD research: having a model for memory management that needs further exploration. (Hopefully there may even be a paper on this soon.)

Some of that exploration will have to be in hardware, and that’s really beyond me but I can and should build a software model to test how a many core system built using this new model might operate.

So far I have been testing or proving concepts with OVPSim but it does not allow me to build a true asynchronous multi-core model, so I need to do that in software myself.

But where to begin – I have a list of classes that you might want to have in C++:

  • Computer – which would aggregate…
    • DRAM
    • Storage
    • NoC – which would aggregate…
      • Mesh
      • Tiles – which would aggregate…
        • CPU
        • Cache
        • Ports (to the Mesh)

I hope you can see how quickly this becomes complex – and all we are talking about here is a simple software framework to allow me to do the research (ie., delivering the software, complex as it is, is only the very start.)

I am struggling to know where to begin – C++ seems like the logical choice for this, but it’s not proving to be much fun. Particularly because my CPU class has to be able to “execute” some code – I thought about using a DSL but may stick to the XML output I got from my hacked Valgrind Lackey – as at least I can then use existing XML engines.

Should I build from the XML up – eg., get a CPU class that can hack the XML and pass the requests up the chain (eg via the cache up to the Mesh and up to the DRAM etc), or what?

Pointers versus references


English: Image for teaching pointers in comput...
English: Image for teaching pointers in computer programming Česky: Obrázek pro výuku ukazatelů v programování (Photo credit: Wikipedia)

Some people don’t like pointers – and for that reason, I think, we have references in C++. But as a confirmed pointer person, I find references very hard going.

I had a piece of C++ code that did this:

PartialPage& DoubleTree::oldestPage()
{
	PartialPage& pageToKill = pageTree.begin()->second);
	long timeToKill = pageTree.begin()->second.getTime();
	map<long, PartialPage&>::iterator itOld;
	for (itOld = pageTree.begin(); itOld != pageTree.end(); itOld++) {
		if (itOld->second.getTime() < timeToKill) {
			timeToKill = itOld->second.getTime();
			pageToKill = itOld->second;
		}
	}
	return pageToKill;
}

This produced rubbish results – because re-assigning the reference didn’t make it refer to a new element of the map. Essentially you cannot mutate a reference in C++ at all.

Switching to pointers fixed the problem though.

PartialPage* DoubleTree::oldestPage()
{
	PartialPage* pageToKill = &(pageTree.begin()->second);
	long timeToKill = pageTree.begin()->second.getTime();
	map<long, PartialPage>::iterator itOld;
	for (itOld = pageTree.begin(); itOld != pageTree.end(); itOld++) {
		if (itOld->second.getTime() < timeToKill) {
			timeToKill = itOld->second.getTime();
			pageToKill = &(itOld->second);
		}
	}
	return pageToKill;
}

 

Racey code can damage your (mental) health


Running the Hackney half marathonI’ve had a tough week. Ran a half marathon (quite badly, but I finished – see the picture) on Sunday, hobbled around York University (blister on my foot) on Monday before returning to London and coping with the added stress of a new job.

All the while the failure of my latest piece of code to work was annoying me too – eating away at me and making me wonder if the whole PhD thing was a bit of Quixotic nonsense or even Panzan stupidity.

Anyone who has chased down a bug for days will know the feeling – the root cause must be out there somewhere, even if you have been through your code dozens of times and can see nothing wrong.

Sitting on a long commute to my place of (temporary) work this morning I realised that my problem could only be down to one thing – a race condition.

I am emulating an LRU/2 type page replacement policy – with two page queues – a “high” and a “low”: pages are initially assigned to low, but can be pushed into high on a second reference. Pages only leave via low (they can be pushed out of high into low if we run out of space in high and so on).

With one thread running there was no problem, but when a second thread came on board the application fell over – and I realised it was because the manipulation of the LRU queues was not atomic.

And, a few dozen code changes later (done on the reverse journey back to London) – and some accounting for C++ template strangeness (it seems that an iterator across a map, means the map is not const – that took me a while to work out) and the problem has gone away and suddenly academic endeavour feels worth it again.