Tag Archives: programming

Deductive debugging


An illustration of multithreading where the ma...
An illustration of multithreading where the master thread forks off a number of threads which execute blocks of code in parallel. (Photo credit: Wikipedia)

This is not a story of a great debugging triumph – but it is one that points to a great debugging truth – study of the bug before you start to pore over your code is more likely to get you to a solution faster.

My example is my code to simulate Belady’s “OPT” page replacement algorithm for a multithreaded environment.

In OPT the idea is that, when we need to make room in main memory for a new page, we select for removal the page with the longest “resuse distance” – in other words, the one we will have to wait the longest (perhaps forever) before needing to use again. This algorithm is sometimes called the “clairvoyant” algorithm because it requires foreknowledge of which memory page will get used when. That does not happen very often in general use computing, but can often be the case in embedded computing, where the code does exactly the same thing over and over again.

In my case I am using a memory trace from a transcoding of 32 frames of video – a small (in terms of time on a real computing device) example of the sort of parallel task you might see in embedded devices. In the real world this runs for a few seconds – but it also generates 220GB of XML trace records spread across 18 threads.

With a single thread it’s easy to work out the reuse distance – you just look at how long it will be before a page gets referenced: you could even do this statically ahead of runtime and just the result up if you wanted.

That is not true in multithreaded code – one thread might run fast or low (eg while waiting for IO) and so the only way to do it is to measure the reuse distances for each page and for every thread:

  • For each thread calculate the minimum reuse distance of each page;
  • Then pick the page with the maximum minimum reuse distance across all threads.

I wrote some code that seemed to do this and on my testing subset of the 220GB XML it seemed to deliver good results. But whenever I ran it against the full trace it started brightly but then performance – by which I mean how fast the simulator ran through the traces in terms of the synthetic ticks generated by the simulator, or bluntly the simulated performance – just seemed to get worse and worse.

In fact the longer a simulated thread seemed to be running, the worse its performance got and the “fastest” thread was always the most recently spawned one, and the more threads that ran the worse this problem got.

Now, the combination of severely limited memory (in this case we were simulating a 16 core NoC with 32Kb local memory per core, which is pooled into one flat 512Kb space), performance can go downhill quite badly as the thread count climbs – though this fall off was catastrophic and it was plain that OPT was going to be worse than “least recently used” (LRU) – and that just cannot be correct! I have not sat down to write a mathematical proof of that but instinctively I know it to be true…

Reading through the code did not draw my attention to any obvious flaws, so I had to sit down and think about what the bug showed me – it worked well on short runs, the most recent thread seemed to do well and the recent threads in general did better than the longer established threads.

Even writing this down now makes it seem obvious – my code was in some way biased towards more recently created threads. And so instead searching through all the code looking for errors, I could instead home in on those parts of code that scanned through each thread.

I found such an error quite quickly but testing again showed that while the problem was diminished, it was not by much – I still had not found what was really to blame.

Another scan through the key sections of the code revealed the real error: when a thread tried to page in memory it only examined the reuse distances of itself and those threads created after it.

Thread data was stored in a linked list, but instead of starting the scan at the head of the list, the scan began at the pointer to the current thread. The result was that the most recent thread had a “perfect” OPT experience – on every page in its reuse distances were fully considered, but at the other extreme the first thread’s reuse distances were only considered when it came to page in memory – so the pages it used but were not used by any other thread were fair game – they appeared to have an infinite reuse distance and so were almost certainly the ones chosen for replacement more or less every time.

Fixing the code so that the scan began with the head of the linked list and not just the local pointer fixed the problem and the OPT simulator is now running – my guess is that it is going to show OPT to be two to three times more efficient than LRU.

 

Enhanced by Zemanta

Trying to stay calm


English: MDB Debugger Screen Shot
English: MDB Debugger Screen Shot (Photo credit: Wikipedia)

Debugging can be deeply frustrating. Not knowing how the debugger works can make it worse.

Still, there are always books (ordered today) and the deeper assurance that, in computing, nothing is truly random – your code breaks for a reason: find the reason and you fix the code.

At least that is what I am telling myself.

Enhanced by Zemanta

Plunging deeper


English: Example red-black tree
English: Example red-black tree (Photo credit: Wikipedia)

For the last month I have been working hard on some C/C++ code to simulate a 16 core computer.

I had already got some code that did this – written in Groovy – but the limitations of the Java VM made it just too difficult to write efficient code to do what I really wanted – which was to simulate the performance of such a 16 core device using Belady’s OPT (optimal – or so-called “clairvoyant”) page replacement policy. The groovy code was going to take two years just to generate the memory reference strings – never mind how quickly it might “execute” the code.

Using C++ (with a C interface as I am just a little bit more comfortable in that language) and some code I wrote when doing my MSc to generate a red-black tree (a semi-balanced binary tree) I was able to build some code that generates the reference strings in a few hours: it’s enough to restore your faith in traditional compiled, natively-executing, code.

But building the simulator to handle the reference strings is another matter. Last weekend, up in York, I was able to write many hundreds of lines and make quite a bit of progress, but nothing was really working.

A week later, and some crash self-tutoring in using the GDB debugger I now have got somewhere, but now have to confront a deeper problem – my red-black tree code is fundamentally broken somewhere.

I wrote this code in the summer of 2010 because I had had such a bad time of it on the C++ course at Birkbeck (my own fault, I didn’t pay enough attention). For the close-to-four-years I have looked on it as something of a coding pinnacle: and certainly it does quite efficiently generate a templatised red-black tree if you are just sticking new values in one after another. But it would appear (I am not quite sure if this is correct – but the evidence points that way) to be quite a bit broken if you subsequently start deleting some nodes in the tree and adding new nodes in: the crashes I am seeing suggest my code creates a loop – the left and right branches pointing back to the parent.

So, the work must go on and I will have to pore over some four year old code (not brilliantly documented) to fix it.

Enhanced by Zemanta

I write C++ like a C programmer


TI-Programmer in hexadecimal mode (indicated b...
Maybe I should go back to one of these things? (Photo credit: Wikipedia)

That’s the only conclusion I can draw as I read through A Tour of C++: which is really is a great way to teach yourself how deficient your coding style is.

It’s pretty plain that C++11 takes the language forward in a big way, while retaining the C substratum that allows journeymen like men to hobble on… but at least the book helps me catch up.

Enhanced by Zemanta

Wanted: a better C++ book


I first taught myself C++ back in 1993 – using Borland’s Turbo C++: a great product I had lots of fun with.

Photo of Bjarne Stroustrup, creator of the pro...
Photo of Bjarne Stroustrup, creator of the programming language C++. (Photo credit: Wikipedia)

After that I moved on Microsoft’s Visual C++ (it was their unilateral cancellation of my subscription that marked a key stage in my disillusionment with Redmond).

In those days C++ was simple – people talked about templates and namespaces but nobody actually used them.

So, when in 2009/10 when I was being taught C++ at Birkbeck I didn’t really pay enough attention – I thought I knew the language but I didn’t.

After that course was over I made a real effort to teach myself C++ properly and wrote some not too bad code. But then Groovy came along and nobody was much interested in my VMUFAT file driver for the Linux kernel and so both C++ and C got neglected.

Now C is like bike riding – if you haven’t done it for a while you are a bit unsteady at first but soon get the hang of it. But C++ is different and now I back to writing in both idioms I miss having a good C++ guide book.

I do have C++: The Complete Reference (4th Edition) but, in my view, this is one of the worst computer books ever written – as it simply reads like a book about C with some bolt-ons.

I also have Computing Concepts with C++ Essentials: but it’s a book about how to program, not a good reference.

What I want is something that tells me how to do things and use things without either drowning me in formal references or treating me like a newcomer.

What is the best option – is The C++ Programming Language still the best? I used to have a copy of that from 20 years ago (perhaps I still have it somewhere in the house) and it was quite good but obviously that edition has long since been superseded.

Any recommendations gratefully received in the comments.

Enhanced by Zemanta

Computing for teachers….


Announcing Code Club Pro
Training for Teachers
As many of you already know, last Friday we launched our newest Code Club initiative: Code Club Pro – Computing for Teachers. That’s right… not only is it Valentine’s day today, it is also our one-week anniversary. See press articles here: GuardianTelegraph; and check out our new websitewww.codeclubpro.org.
What is it? 

In partnership with CAS and with the support of Google, Code Club Pro will deliver CPD training and resources to primary school teachers. We’d like our teachers to feel confident and excited about the new computing curriculum and computing in general.

Why are we doing it?

 

As the teachers out there know, the new computing curriculum comes into effect this September. That’s only 6 months away. Eeek! In our experience, many feel they have not had sufficient programming experience or training. The language is totally new and frankly, can be quite off-putting. And that’s on top of changes to the rest of the curriculum. A huge task, by any measure – thank goodness our teachers are up to the challenge! We want to help. It is key that the front-line implementers of the new curriculum and policy – the teachers – should be given the skills and support they need.

 

How do we feel?

 

We are excited. It’s a great move toward children thinking like engineers: solving problems, using logic, analysing processes and creating rather than just consuming technology. These skills can be used to enrich other subjects too, like Maths, Science and English. See tef’s blog post here.

It’s quite a big task though! There are approx 200,000 teachers in the UK and we want to help as many as we can. Luckily, Google and CAS have invested in our project, and we have you – our amazing volunteers. With great supporters behind us, we know we can achieve many things.

 

Happy one week CCP Anniversary all!

Sophie Deen
PM, Code Club Pro

PS: Code Club Pro are looking for an intern. We want someone very smart and are offering heaps of exposure, experience, fun and autonomy, plus the London living wage. Check out the ad here.

PPS:
 If whilst reading this newsletter you felt a creeping sense of déjà vu, it’s probably because you already read my blog post last week…

 

(NB: I have no financial or other stake in this, I am merely a Code Club volunteer)

From two years to two hundred minutes


I rewrote my parsing code in C/C++ and a task that was likely to take two years in Groovy/Java while running 18 threads – on a multicore machine with a large amount of memory – (I am not exaggerating about the turnaround time, either) completed in not much more than three hours in a single thread on the same machine.

(The new code has a C main – but the bulk of the code is in C++ which is accessed through an extern “C” interface.)


#include <iostream>
#include <cstdio>
#include "redblack.hpp"

using namespace std;

//Copyright Adrian McMenamin 2014
//acm538@york.ac.uk
//C wrap around C++ redblack tree code
//Licensed under GNU GPL verison 2
//or any later version at your discretion

class pageinst
{
 private:
 long instruction;
 pageinst* next;

public:
 pageinst(long inst);
 long getinst(void);
 pageinst* getnext(void);
 void setnext(pageinst* newinst);
};

pageinst::pageinst(long inst)
{
 instruction = inst;
 next = NULL;
}

long pageinst::getinst(void)
{
 return instruction;
}

pageinst* pageinst::getnext(void)
{
 return next;
}

void pageinst::setnext(pageinst* newinst)
{
 next = newinst;
}

class pagechain
{
 friend ostream& operator<<(ostream& os, pagechain& pc);

private:
 long page;
 pageinst* head;
 pageinst* tail;

public:
 pagechain(long where);
 long getpage(void);
 bool operator==(pagechain&) const;
 bool operator<(pagechain&) const;
 pageinst* gethead(void);
 void sethead(pageinst* newhead);
 pageinst* gettail(void);
 void settail(pageinst* newtail);
};

ostream& operator<<(ostream& os, pagechain& pc)
{
 os << pc.page;
 return os;
}

long pagechain::getpage(void)
{
 return page;
}

bool pagechain::operator==(pagechain& pc) const
{
 return (page == pc.page);
}

bool pagechain::operator<(pagechain& pc) const
{
 return (page < pc.page);
}

pagechain::pagechain(long where)
{
 page = where;
 head = NULL;
 tail = NULL;
}

pageinst* pagechain::gethead(void)
{
 return head;
}

void pagechain::sethead(pageinst* newhead)
{
 pageinst* oldhead = gethead();
 head = newhead;
 if (oldhead)
 newhead->setnext(oldhead);
 if (!gettail())
 tail = newhead;
}

pageinst* pagechain::gettail(void)
{
 return tail;
}

void pagechain::settail(pageinst* newtail)
{
 pageinst* oldtail = gettail();
 tail = newtail;
 if (oldtail)
 oldtail->setnext(newtail);
 if (!gethead())
 head = newtail;
}

void killchain(pageinst* pi)
{
 if (pi == NULL)
 return;
 pageinst* next = pi->getnext();
 delete pi;
 killchain(next);
}

void killtree(redblacknode<pagechain>* node)
{
 if (node == NULL)
 return;
 killtree(node->left);
 killtree(node->right);
 killchain(node->getvalue().gethead());
}

void writechain(pagechain* pc, FILE* fout)
{
 if (pc == NULL)
 return;
 fprintf(fout, "PAGE: %li ", pc->getpage());
 pageinst* pi = pc->gethead();
 while (pi) {
 fprintf(fout, ",%li", pi->getinst());
 pi = pi->getnext();
 }
 fprintf(fout, "\n");
}

void writeoutpages(redblacknode<pagechain>* node, FILE* fout)
{
 if (node == NULL)
 return;
 writeoutpages(node->left, fout);
 writechain(&node->getvalue(), fout);
 writeoutpages(node->right, fout);
}

extern "C" {

void* createtree(void)
{
 redblacktree<redblacknode<pagechain> >* tree;
 tree = new redblacktree<redblacknode<pagechain> >();
 return static_cast<void*>(tree);
}

void removetree(void* tree)
{
 redblacktree<redblacknode >* rbtree;
 rbtree = (static_cast<redblacktree<redblacknode<pagechain> >* >(tree));
 killtree(rbtree->root);
 delete rbtree;
 tree = NULL;
}

void* getroot(void* tree)
{
 redblacktree<redblacknode<pagechain> >* nodetree =
 static_cast<redblacktree<redblacknode<pagechain> >*>(tree);
 return static_cast<void*>(nodetree->root);
}

//if node for this page exists add to its tail
//if node for this page does not exist, create and add to tail
void insertinstruction(long pagenumber, long instruction,
 void* tree, void* root)
{
 redblacknode<pagechain> *rootnode, *pagenode, *dummynode;
 redblacktree<redblacknode<pagechain> >* nodetree;

 pagechain dummychain = pagechain(pagenumber);
 rootnode = static_cast<redblacknode<pagechain>*>(root);
 nodetree = static_cast<redblacktree<redblacknode<pagechain> >* >(tree);

dummynode = new redblacknode<pagechain>(dummychain);
 pagenode = nodetree->locatenode(dummynode, rootnode);
 pageinst* nextinstruction = new pageinst(instruction);

if (pagenode) {
 pagenode->getvalue().settail(nextinstruction);
 delete dummynode;
 } else {
 dummynode->getvalue().settail(nextinstruction);
 nodetree->insertnode(dummynode, rootnode);
 }
}

void writeinorder(void* tree, FILE* fileout)
{
 redblacktree<redblacknode<pagechain> >* nodetree =
 static_cast<redblacktree<redblacknode<pagechain> >*>
 (tree);

 writeoutpages(nodetree->root, fileout);
}

} //end extern "C"

The old code used Groovy’s Map class to index a sparse array, the new code uses a red-black tree to index the same data. The speed up is not just because of the tree’s superior performance, but it surely has a lot to do with it.

I now have to decide whether to re-write the whole of the testing code in C/C++ … I think I am going to have to.

Enhanced by Zemanta

The joy of .CXX


I have a problem.

The rise of the C guru. (LOL =])
The rise of the C guru. (LOL =]) (Photo credit: Dzhus)

I need to parse 220GB of XML to find an OPT reference string (i.e. to profile the memory references of a running application to create a page replacement algorithm that will know which is the most efficient page replacement decision.)

I had one algorithm in mind for this and I wrote the Groovy code but it kept exhausting the memory pool of the VM (as it generated a lot of anonymous memory). Even whacking the VM up in size quite considerably didn’t seem to fix the problem.

So I used another algorithm – and it works. After a fashion. Because even with 18 threads running in parallel on the University of York’s big iron compute server I think it might take around two years to complete. And I don’t have two years.

So I need to find something better – something where I have much more control over the memory allocation code and where I can also be sure of performance.

C is the obvious answer. Or, in this case, C with an interface to some C++ code I wrote a few years ago to build a red-black tree.

The strange thing is that, despite Groovy’s claim that it removes the really tedious bits of code writing by doing away with boiler plate, this weekend I have found writing all that getter/setter code quite therapeutic: perhaps this is a programmer’s comfort blanket?

Enhanced by Zemanta

Give yourself a Christmas present: learn sed


English: A Shebang, also Hashbang or Sharp ban...
A Shebang, also Hashbang or Sharp bang. (Photo credit: Wikipedia)

Text is at the core of The Unix Way – and all True Unix Hackers work from the command line. This much you know.

(If you don’t get a copy of The Art of Unix Programming – there is an awful lot of rubbish in that book but it does do one thing well: explain the deep connection between text and Unix.)

In a practical sense this means to get the best from your Unix system (and this includes you if you are a Mac OSX user) you need to boost your command line skills. The first thing to do is, of course, become familiar with a text editor – either vi or emacs (I am a vi user, but refuse to engage in a religious war on this matter.)

Then, perhaps not the next thing, but one of the next things you should do is learn sed – the streaming editor – one of the many gifts to the world (including Unix, of course) from Bell Labs (I recently read The Idea Factory: Bell Labs and the Great Age of American Innovation and I suppose I really ought to get around to writing a review of that).

Sed comes from the 1970s, but as so often in computing, it feels to me that its time has come again – in the era of big data a program that allows you to edit a file one line at a time – as opposed to trying to read as much of a file as possible into your computer’s memory – has come round again.

If you are sufficiently long in the tooth to have messed about with Microsoft’s edlin or any other line editor you might be forgiven for giving a hollow laugh at this point – but sed is a tool that genuinely repays the effort you have to make to learn it.

In the last few weeks I have been messing about with 220GB XML files and even the University of York’s big iron compute server cannot handle a buffered edit of a file that size – sed is the only realistic alternative (actually I thought about using my own hex editor – hexxed – which is also essentially a line editor, but a hex editor is really for messing about with binary files and I wouldn’t recommend it.

Sed has allowed me to fix errors deep inside very large files with just a few commands – eg:

LANG=C sed ’51815253s@^.*$@<instruction address=\’004cf024\’ size=’03′ />@’ infile.xml >outfile.xml

Fixes line 51,815,253 in my file (the line identified by an XML fatal error). Earlier I had executed another line of sed to see what was wrong with that line:

LANG=C sed -n ’51815253p’ infile.xml

(The LANG=C prefix is because the breakage involved an alien locale seemingly being injected into my file.)

Sed allows you to do much more – for instance anything you can identify through a pattern can be altered. Let’s say you have (text) documents with your old email address – me@oldaddress.com – and you want to change that to your new address – me@newaddress.com …

sed ‘s/me@oldaddress\.com/me@newaddress\.com/g’ mytext.txt > newtext.txt

Then check newtext.txt for correctness before using mv to replace the original.

But there is much, much more you can do with it.

Plus you get real cred as a Unix hacker if you know it.

Now, too many programs these days – especially anything from Redmond – go out of their way to suppress text formats. Text, after all, is resistant to the “embrace and extend” methodology – text wants to be free. But there is plenty of it out there still.

Books that teach you about sed are not so plentiful – I have been reading an old edition of sed & awk – which seems to be out of print – though you can buy a second hand copy for less than a quid excluding postage costs. Well worth the investment, I’d say.

How slow is a fast computer?


Valgrind
Valgrind (Photo credit: Wikipedia)

I am working on a simulation environment for a NoC. The idea is that we can take a memory reference string – generated by Valgrind – and then test how long it will take to execute with different memory models for the NoC. It’s at an early stage and still quite crude.

The data set it is working with is of the order of 200GB, though that covers 18 threads of execution and so, very roughly speaking, it is 18 data sets of just over 10GB each. I have written some parallel Groovy/Java code to handle it and the code seems to work, though there is a lot of work to be done.

I am running it on the University of York’s compute server – a beast with 32 cores and a lot of memory. But it is slow, slow, slow. My current estimate is that it would take about 10 weeks to crunch a whole dataset. The code is slow because we have to synchronise the threads to model the inherent parallelism of the NoC. The whole thing is a demonstration – with a vengeance – of Amdahl’s Law.

Even in as long a project as a PhD I don’t have 10 weeks per dataset going free, so this is a problem!