So, dear reader, I wrote a Gauss-Jordan solver


Number of steps in the Euclidean algorithm for...
Number of steps in the Euclidean algorithm for GCD(x,y). Red points indicate relatively few steps (quick), whereas yellow, green and blue points indicate successively more steps (slow). The largest blue area follows the line y = Φx, where Φ represents the Golden ratio. (Photo credit: Wikipedia)

Been a while…

…Anyway: I am now designing (PhD related) a simulation of a NoC system and I need to have it run some code – so I thought I’d look at code to solve a system of linear equations and sought to build the same using Gauss-Jordan elimination.

It proved to be a lot more difficult than I had hoped: not that the maths was too complex, but that producing something I could later simplify from C++ into something that looked like synthetic assembly has and is proving very difficult indeed.

My system needs to execute a simple synthetic assembly because what I am modelling is the way the memory interactions work – so I have to get close to the metal and not hide behind a high-level language (no, not even C will do). Obviously – or at least it feels like that to me – I don’t want to have to mess about with floating point implementations, so I wanted integer arithmetic.

And that is where the first problem comes – to implement a Gauss-Jordan solver (or even to solve linear equations in the way you used to at school: the Gauss-Jordan algorithm is merely a scaling up of that), you have to divide numbers and sadly (for this purpose) not all numbers are multiples of one another – so the only way to preserve integer maths is to use rationals.

That, of course, is not such a problem. The C++ pair utility even makes it easy to model a rational out of two integers (writing a separate rational class would be overkill for something as small as this.)

But then, quickly, another problem becomes apparent. Even when using long long types to represent the numerator and denominator of our rational fraction repeated multiplications to enable fractional subtraction breaks the integer bounds after little more than a handful of row operations (and I want to model a large system – with literally thousands of equations in each system).

Even when adding a repeatedly applied gcd function to apply Euclid’s algorithm to the rational fractions, the numbers go out of range quickly.

So I have been forced to rely on an arbitrary precision library – GNU’s GMP. The code now works well (you can see at its Github repro and the solver code is below:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
#include <utility>
#include <gmpxx.h>



//Gauss-Jordan elimination

using namespace std;

void gcd(pair<mpz_class, mpz_class>& input)
{
	mpz_class first = abs(input.first);
	mpz_class second = abs(input.second);
	if (first > second) {
		first = second;
		second = abs(input.first);
	}
	while (second != 0) {
		mpz_class temp = second;
		second = first%second;
		first = temp;
	}
	input.first /= first;
	input.second /= first;
	if (input.second < 0) {
		input.first *= -1;
		input.second *= -1;
	}
}

int main()
{
	string path("./variables.csv");
	ifstream inputFile(path);

	vector<mpz_class> answers;
	//answer line
	string rawAnswer;
	getline(inputFile, rawAnswer);
	istringstream stringy(rawAnswer);
	string number;
	while(getline(stringy, number, ',')) {
		answers.push_back(atol(number.c_str()));
	}

	//now read in the system
	vector<vector<pair<mpz_class, mpz_class> > > lines;
	while(getline(inputFile, rawAnswer)) {
		istringstream stringy(rawAnswer);
		vector<pair<mpz_class, mpz_class> > innerLine;
		while (getline(stringy, number, ',')) {
			pair<mpz_class, mpz_class>
				addPair(atol(number.c_str()), 1);
			innerLine.push_back(addPair);
		}
		lines.push_back(innerLine);
	}

	for (int i = 0; i < lines.size(); i++) {
		pair<mpz_class, mpz_class> pivot(lines[i][i].second,
			lines[i][i].first);
		if (lines[i][i].first != 0) {
			lines[i][i].first = 1;
			lines[i][i].second = 1;
		} else {
			continue;
		}
		for (int j = i + 1; j <= lines.size(); j++) {
			lines[i][j].first *= pivot.first;
			lines[i][j].second *= pivot.second;
			gcd(lines[i][j]);
		}
		for (int j = i + 1; j < lines.size(); j++) {
			pair<mpz_class, mpz_class> multiple(lines[j][i].first,
				lines[j][i].second);	
			lines[j][i] = pair<mpz_class, mpz_class>(0, 1);
			for (int k = i + 1; k <= lines.size(); k++) {
				pair<mpz_class, mpz_class>
					factor(
					multiple.first * lines[i][k].first,
					multiple.second * lines[i][k].second);
				gcd(factor);
				lines[j][k] = pair<mpz_class, mpz_class>
					(lines[j][k].first * factor.second -
					factor.first * lines[j][k].second,
					lines[j][k].second * factor.second);
				gcd(lines[j][k]);
			}
			
		}
	}
	// now eliminate upper half
	for (int i = lines.size() - 1; i > 0; i--) {
		if (lines[i][i].first == 0) {
			continue;
		}
		pair<mpz_class, mpz_class> answer = lines[i][lines.size()];
		for (int j = i - 1; j >= 0; j--) {
			pair<mpz_class, mpz_class> multiple = lines[j][i];
			if (multiple.first == 0) {
				continue;
			}
			lines[j][i] = pair<mpz_class, mpz_class>(0, 1);
			lines[j][lines.size()].first =
				lines[j][lines.size()].first * multiple.second
				- answer.first * multiple.first * 
				lines[j][lines.size()].second;
			lines[j][lines.size()].second =
				lines[j][lines.size()].second *
				answer.second * multiple.second;
			gcd(lines[j][lines.size()]);
		}
	}
	cout << "DIAGONAL FORM" << endl;
	for (int i = 0; i < lines.size(); i++) {
		for (int j = 0; j < lines.size(); j++) {
			if (lines[i][j].first == 0) {
				cout << "0 , ";
			} else {
				cout << lines[i][j].first << "/" << lines[i][j].second << " , ";
			}
		}
		cout << " == " << lines[i][lines.size()].first << " / " << lines[i][lines.size()].second << endl;
	}

}

(NB: I know that GMP supports rationals “out of the box” but I am trying to model something that I can translate to my synthetic assembly and so hiding the details of the implementation is not what I want to do.)

How on Earth, though, am I going to model arbitrary precision code in my “assembly” code?

Advertisements

Not even on Wikipedia…


If you are old enough, like me, to remember the Cold War before the days of glasnost and perestroika, you will also recall that one of the strategic weaknesses of the Soviet Union was that it was forced to steal and copy advanced western technologies, seemingly unable to invent them itself.

In many cases that was plainly true – spies stole the secrets of the Manhattan Project to give Stalin his atomic bomb (though Soviet scientists devised H-bomb mechanisms independently).

But in the case of computing, the decision to copy the west was a deliberate and conscious one, taken despite real skill and specialism existing inside the Soviet Union. A while back I wrote about how Soviet computer scientists appeared to be some years ahead of the west in the study of certain algorithms that are important for operating system management. In hardware it was not that the Soviets had a lead – but the first electronic computer on continental Europe was build in the Soviet Union and was based on independent research – but they certainly had real know-how. What killed that was a decision by the Soviet leadership to copy out-of-date IBM machines instead of continuing with their own research and development.

All this is recounted, in novelised form, in the brilliant Red Plenty. The book highlights the role of  Sergey Alekseevich Lebedev, the Ukrainian known as “the Soviet Turing“. Like Turing, Lebedev was taken from his work (as an electrical/electronic engineer rather than a mathematician) by the war and played an important role in Soviet tank design. Afterwards he returned to his studies with a vengeance and by the mid-fifties he was building some of the world’s fastest and, arguably, the best engineered, computer systems  – the so-called MESM (a Russian acronym for “Small Electronic Calculating Machine”.)

Yet today he does not even appear to rate an entry in Wikipedia.

The Soviet computer industry was not just killed by poor decisions at the top, but by the nature of the Soviet system. Without a market there was no drive to standardise or commoditise computer systems and so individual Soviet computers were impressive but the “industry” as a whole was a mess. Hopes that computers could revolutionise Soviet society also fell flat as the centralised planning system ran out of steam. Switching to copying IBM seemed like a way of getting a standardised system off the shelf, but it was a blow from which Soviet computing never recovered.

In continued praise of “Programming Pearls”


Algorithm City

I have read two books in the last year that have fundamentally changed the way I think about computers and programming – The Annotated Turing and Programming Pearls.

Programming Pearls in particular continues to inspire me in the way it makes you think about building better algorithms and using data structures better – and here’s a short tale about what I have just done (and why I did it) with my valext program.

Valext runs a program under step tracing to allow the detailed examination of patterns of memory use – interrogating the /proc/pid/pagemap at every step.

The way I was doing this was to query the /proc/pid/maps, find the blocks that were being mapped and then seek through /proc/pid/pagemap, reading off the page status.

For every page that was mapped I would build a data structure and stick it into a linked list

struct blocklist {
    uint64_t address;
    struct blocklist *nextblock;
};
...
for (i = startaddr; i < endaddr; i++)
{
    block = malloc(sizeof (struct blocklist));
    block->address = i;
    block->nextblock = NULL;
    if (*lastblock == NULL){
        *lastblock = block;
        *header = block;
    } else {
        (*lastblock)->nextblock = block;
        *lastblock = block;
    }
}

But the problem with this code – an \mathcal{O}(n) algorithm – is that performance stinks. I was running it on a very underpowered laptop and got to about four days of wallclock time, in which time about 2 million steps had been taken.

So today I rewrote it with an essentially \mathcal{O}(1) algorithm – I allocate an array of arbitrary size (512 elements in current code) and fill that up – with a 0 value being the guard (ie marking the end). If more than 512 elements are needed then another 512 will be allocated and chained to the top and so on…


struct blockchain {
    int size;
    uint64_t *head;
    struct blockchain *tail;
};

uint64_t* blockalloc(int size)
{
    uint64_t *buf = calloc(size, sizeof(uint64_t));
    return buf;
}

struct blockchain *newchain(int size)
{
    struct blockchain *chain = NULL;
    chain = calloc(1, sizeof(struct blockchain));
    if (!chain)
        return NULL;
    chain->head = blockalloc(size);
    if (chain->head == NULL) {
        free(chain);
        return NULL;
    }
    chain->size = size;
    return chain;
}

/* recursively free the list */
void cleanchain(struct blockchain *chain)
{
    if (!chain)
        return;
    cleanchain(chain->tail);
    free(chain->head);
    free(chain);
    return;
}

/* set up a list */
struct blockchain* getnextblock(struct blockchain** chain,
struct blockchain** header, char* buf)
{
    int match, t = 0;
    uint64_t startaddr;
    uint64_t endaddr;
    uint64_t i;
    const char* pattern;
    regex_t reg;
    regmatch_t addresses[3];

    pattern = "^([0-9a-f]+)-([0-9a-f]+)";
    if (regcomp(&reg, pattern, REG_EXTENDED) != 0)
        goto ret;
    match = regexec(&reg, buf, (size_t)3, addresses, 0);
    if (match == REG_NOMATCH || match == REG_ESPACE)
         goto cleanup;
    startaddr = strtoul(&buf[addresses[1].rm_so], NULL, 16) >> PAGESHIFT;
    endaddr = strtoul(&buf[addresses[2].rm_so], NULL, 16) >> PAGESHIFT;
    if (*chain == NULL) {
        *chain = newchain(MEMBLOCK);
        if (*chain == NULL)
            goto cleanup;
        *header = *chain;
    }
    for (i = startaddr; i < endaddr; i++)
    {
        if (t >= MEMBLOCK) {
        struct blockchain *nxtchain = newchain(MEMBLOCK);
        if (!nxtchain)
             goto cleanup;
        (*chain)->tail = nxtchain;
        *chain = nxtchain;
        t = 0;
    }
    (*chain)->head[t] = i;
     t++;
    }
cleanup:
    regfree(&reg);
ret:
    return *chain;
}

As I am only testing this code now I don’t know if it will be faster – though everything tells me that it should be (though I am not entirely clear if this memory allocation issue is the real bottleneck in the program or whether it is just that a Pentium III is a Pentium III and there is nothing much I can do about that).

What I do know is that if it was not for the inspiration of Programming Pearls I probably would not even have bothered trying.

“Basically, you would be able to compute anything you wanted”


The quote that forms the title here comes from Lance Fortnow, a computer scientist at Northwestern University, in an article (here – subscription required) in the current edition of the New Scientist on the P = NP question.

It’s an odd statement for a computer scientist to make – most numbers are transcendental numbers and so are fundamentally incomputable: for instance there are \aleph_{0} (aleph null – the smallest of Cantor’s hypothesised infinities) transcendental numbers between 0 and 1 (or between any range of integers).

But besides that oddity it is actually a very good article – calling the world where P = NP Algorithmica – “the computing nirvana”.

I have written before of how much I hope we do live in Algorithmica, though the consensus is we live in a world of NDAlgorithmica (ND for non-deterministic).

The article’s beauty is that it also examines the states between the two: what if, for instance, we discovered that the class of P problems were identical to the class of NP problems but that we could not find the P algorithm, or that the P algorithm was of such a degree of complexity it “would not amount to a hill of beans”?

The opposite of science, but could be fun


Venn diagram for P, NP, NP-Complete, and NP-Ha...
Image via Wikipedia

I have created a prediction market on Vladimir Romanov’s P=NP proposal. Starting price for it being proved correct by the end of 2011 is very low – or the odds are long, depending on how you look at it (10 cents wins $99.90 ie 999/1).

Have a bet – it’s free and it’s fun.

It is here.

Possibly the most important news you will read this year


This an example of how a public and private ke...
Image via Wikipedia

Apparently P==NP. (So public key encryption – used for internet commerce – is broken and many more problems than we previously thought are quickly solvable).

At least that is the suggestion you can read here. Slashdot also has this here.

If it’s true then the revolution has just begun. If it’s false, well, tomorrow’s another day…

What if P = NP?


Update (5 March): read a better version here.

I admit I now going slightly out of my depth, but I will try to explain what this is about and why it is interesting.

It is known that computers can solve some problems in what is called “polynomial time“: that is to say a finite time that is proportional to a polynomial of complexity of the input. The key thing is that these problems are computable using mechanical steps (ie algorithms) in a way that is (sometimes euphemistically) described as “quickly”.

These can be simple problems – like what is the sum of the first ten integers – or more complex ones, such as creating self-balancing trees, sorting database records alphabetically and so on.

These are the so-called “P” (for polynomial time class) problems. Here’s a definition:

A problem is assigned to the P (polynomial time) class if there exists at least one algorithm to solve that problem, such that the number of steps of the algorithm is bounded by a polynomial in n, where n is the length of the input.

Then there are another class of problems which seem fiendishly difficult to solve but which it is relatively simple to prove the correctness of any solution offered. These problems can also be solved (computed) in polynomial time – ie a finite time – and they can also be computed by a Turing machine (a simple model of a computer) and so an algorithmic solution exists. It is just that one cannot tell what that algorithm is. These are said to be solvable in unbounded polynomial time – and in the worst case the only way – it is thought – that a solution can be found is through an exhaustive search of all algorithms – in other words a form of “brute force“. These are the NP (Not in class P) problems.

Now most mathematicians think that NP does not equal P and that may or may not be a good thing as much of our internet commerce relies on encryption which is thought to be an NP problem.

(In Afghanistan in 2001 US cryptanalysts seemingly brute forced a Taliban Windows NT box but it used much weaker encryption than most electronic commerce.)

But what if it were the case that all seemingly NP problems were actually P problems? There are a lot of people studying this issue – but according to the New Scientist (their Christmas edition, the first in my subscription and delivered this morning, inspired me to write this) we should expect to wait at least until 2024 for an answer (by which point the problem – first formulated in 1971 – will have reached the age at which 50% of major mathematical problems will have been solved).

Some problems thought to be NP have already been shown to be P and there was a big fuss earlier in 2010 when a draft proof of P = NP (edit: it was actually P != NP) was published (the proof was flawed). And unlike, say, Fermat’s last theorem, proving P = NP is likely to have more or less immediate effects on the lives of many of us (how would you feel if you knew that it was possible, even if not likely, that someone had developed a way to quickly crack open all your internet credit card transactions?)

Of course, proving P = NP for all problems does not necessarily mean we will have determined polynomial time based solutions for all the current NP problems out there. But I would expect it would quickly lead to the solution of a multitude of them.

And, actually, I think the benefits to humanity would be enormous. The most likely immediate effects would be in improvements in computing/operating system efficiency. fast computers for less money and/or less energy consumption would be a huge benefit. From that alone many other benefits will flow in terms of information availability and universality.

530 lines of Javascript


I have just written that amount of code in what I persist in thinking of as a toy language (it was actually somewhat longer until I refactored the code to group some common functions together),

I had to do this for a coursework exercise – a lot of effort to be honest for what is at stake – processing a rather large XML file with some XSL. The Javascript essentially manages the parameters.

At the end my conclusion is that I don’t really see why anybody would want to write that much client code if they could possibly help it. Of course it transfers the computational burden to the client – but at the cost of hundreds of lines of interpreted code which is essentially under the control of the people who write the engines in the Firefox and IE browsers. In the real world that points towards a support nightmare.

Having written a fair bit of Perl (and AJAX) stuff in the past the whole thing felt unnatural – dozens of lines, much of it designed to handle the differences between the browser engines, that could have been handled simply on the server side.

One thing that I was convinced of was the potential power of XSLT: though I was not quite prepared for the revelation that it is Turing complete (ie it would be possible to write some XSL that would process any algorithm/task solvable through a finite number of mechanical steps). Though I shudder to think of how big a stylesheet would be required to handle all but the smallest of task.

But the potential power of XSLT is not the same of thinking of many practical uses for it!

Algorithms matter … a lot


One of the things that most annoyed me about the article on graphics was the author’s subsequent attempts (once he came under criticism) to justify bad technological advice through the excuse that falling hardware prices meant that his advice to use inappropriately optimised formats did not matter.

For the truth is that cheaper hardware expands the use of technology at both the “fat” end (ie your 4GB quad core desktop) and the “thin” end – your mobile phone and lower still. As computing becomes ubiquitous then appropriate optimisation will be as important as it ever was.

And appropriate optimisation means efficient and effective algorithms.

In fact, suggests a report quoted here, improvement in algorithm efficiency has contributed more – by an order of magnitude – to advances in some branches of computing than falling hardware prices (I am using price here as an analogue for the general Moore’s Law advance).

The report only appears to cite two specific examples of how algorithmic improvement has far outstripped the silicon, so I would be wary of claiming in general that algorithms have contributed a 30 – 43 times greater improvement to computing efficiency in the 15 years between 1988 and 2003 (something not bothering the slashdot summariser, so this is now likely to become the myth), it is a remarkable finding nonetheless.

Some advice you should ignore


I have an interest in graphic formats – I wrote the perl package Image::Pngslimmer a while back when I was hacking some perl database code that delivered (via Ajax) graphs and photographs.

I built the whole website for fun and learning purposes and so therefore used PNG graphics when JPEGs would have (for the photographs at least) been more appropriate.

So, therefore an article that begins:

Virtually every developer will use bitmaps at times in their programming. Or if not in their programming, then in a website, blog, or family photos. Yet many of us don’t know the trade-offs between a GIF, JPEG, or PNG file – and there are some major differences there. This is a short post on the basics which will be sufficient for most, and a good start for the rest. Most of this I learned as a game developer (inc. Enemy Nations) where you do need a deep understanding of graphics.

has my attention.

But what a load of rubbish it turns out to be. The author plainly doesn’t know what the “trade-offs” are either as he states:

PNG – 32-bit (or less), lossless compression, small file sizes – what’s not to like. Older versions of some browsers (like Internet Explorer) would display the transparent pixels with an off-white color but the newer versions handle it properly. Use this (in 32-bit mode using 8 bits for transparency) for everything.

…and…

JPEG – 24-bit only (i.e. no transparency), lossy (can be lossless but compressions drops a lot), small file sizes. There is no reason to use this format unless you have significant numbers of people using old browsers. It’s not a bad format, but it is inferior to PNG with no advantages.

This is seriously bad advice and it is also technically wrong.

First, PNG is not a lossless format. Many PNGs are indeed lossless, but they do not have to be.  A substantial part of the  code in the Image::Pngslimmer package is about using the median cut algorithm to produce a lossy PNG out of a lossless one by producing a best match set of colours for the image and then converting pixels to the colour in the set which is closest to them in RGB space (the median cut algorithm ensures that there are more members of the set in the bits of RGB space where there are more colours in the original image).

Second, JPEG remains by far the best way of representing photographs in the real world internet, where the trade-off between size and quality is important.  Look closely at the colour boundaries in a JPEG, especially a highly compressed one, and you can see how the edges are blurred and colours cross the boundaries – that is an artefact of the compression and indeed represents a loss (once compressed in this way the quality of the original image cannot be recovered). But it is also a highly efficient algorithm and can produce much smaller image files than the lossless PNG equivalent without any noticeable dimunition in quality for the average viewer.

As a parallel think of MP3s and FLACs for audio. Like FLACs, PNGs (can be) lossless and compressed – so delivering a much smaller file than the source WAV or similar file. But MP3s can be much smaller again without any noticeable dimunition in quality for most people on most equipment.

Yes, more and more people are connecting at 25Mb/s, but then more and more people are also connecting using GPRS, EDGE and low bandwidth 3G connections. Size still matters.

PNG is absolutely superior for representing line drawings (including artefacts such as graphs) as no one wants to see the lines start to fuzz. But that is a pretty limited subset of images on the internet.

So here is some proper advice: use JPEGs (at about 85% compression) for almost all photographs you want to put on the internet. If you have a line drawing or similar, prefer PNGs. If you have a truly lossless image (remember someone may have compressed it before it gets to you) and you need to keep it that way, then use the PNG format: but expect your file size to be very large.