C, C, glorious C

gdb icon, created for the Open Icon Library
gdb icon, created for the Open Icon Library (Photo credit: Wikipedia)

This blog – The Unreasonable Effectiveness of C – is really very good, and makes a lot of points that are very difficult to argue with.

Though, lately, I have (re-)discovered the joys of C++ – even if I do write code in that language like a C programmer.

In the last six months I have written a lot of code designed to process a lot of data. I started off with Groovy – my data was XML and Groovy comes with tools that are specifically designed to make a programmer’s life a lot easier when handling XML.

But, in the end, Groovy just was not up to the task – because for my needs: processing 200 GB of XML, Groovy (or, rather, the Java VM on which it runs) is just not fast or flexible enough.

My response to that was to go “route 1” and reach for C. I know and knew I could build what I wanted in C and that it would be as fast as I could reasonably hope to get it (going down to Assembly was just not a serious option).

However, the code I am actually using is in C++. I found that the abstractions offered by the STL were worth the price in terms of speed of execution and code that is somewhat opaque to the GDB debugger.

It’s a compromise, of course, but I suspect if I had tried to write the equivalent of the STL map and set classes in C, I’d still be debugging my implementation many weeks later – after all I found that my C++ red-black tree code was actually broken despite using it (successfully) for a number of years.

Real programmers have to make these compromises – and C++ struck the right note between language expressiveness and language power for me.

Enhanced by Zemanta

From two years to two hundred minutes

I rewrote my parsing code in C/C++ and a task that was likely to take two years in Groovy/Java while running 18 threads – on a multicore machine with a large amount of memory – (I am not exaggerating about the turnaround time, either) completed in not much more than three hours in a single thread on the same machine.

(The new code has a C main – but the bulk of the code is in C++ which is accessed through an extern “C” interface.)

#include <iostream>
#include <cstdio>
#include "redblack.hpp"

using namespace std;

//Copyright Adrian McMenamin 2014
//C wrap around C++ redblack tree code
//Licensed under GNU GPL verison 2
//or any later version at your discretion

class pageinst
 long instruction;
 pageinst* next;

 pageinst(long inst);
 long getinst(void);
 pageinst* getnext(void);
 void setnext(pageinst* newinst);

pageinst::pageinst(long inst)
 instruction = inst;
 next = NULL;

long pageinst::getinst(void)
 return instruction;

pageinst* pageinst::getnext(void)
 return next;

void pageinst::setnext(pageinst* newinst)
 next = newinst;

class pagechain
 friend ostream& operator<<(ostream& os, pagechain& pc);

 long page;
 pageinst* head;
 pageinst* tail;

 pagechain(long where);
 long getpage(void);
 bool operator==(pagechain&) const;
 bool operator<(pagechain&) const;
 pageinst* gethead(void);
 void sethead(pageinst* newhead);
 pageinst* gettail(void);
 void settail(pageinst* newtail);

ostream& operator<<(ostream& os, pagechain& pc)
 os << pc.page;
 return os;

long pagechain::getpage(void)
 return page;

bool pagechain::operator==(pagechain& pc) const
 return (page == pc.page);

bool pagechain::operator<(pagechain& pc) const
 return (page < pc.page);

pagechain::pagechain(long where)
 page = where;
 head = NULL;
 tail = NULL;

pageinst* pagechain::gethead(void)
 return head;

void pagechain::sethead(pageinst* newhead)
 pageinst* oldhead = gethead();
 head = newhead;
 if (oldhead)
 if (!gettail())
 tail = newhead;

pageinst* pagechain::gettail(void)
 return tail;

void pagechain::settail(pageinst* newtail)
 pageinst* oldtail = gettail();
 tail = newtail;
 if (oldtail)
 if (!gethead())
 head = newtail;

void killchain(pageinst* pi)
 if (pi == NULL)
 pageinst* next = pi->getnext();
 delete pi;

void killtree(redblacknode<pagechain>* node)
 if (node == NULL)

void writechain(pagechain* pc, FILE* fout)
 if (pc == NULL)
 fprintf(fout, "PAGE: %li ", pc->getpage());
 pageinst* pi = pc->gethead();
 while (pi) {
 fprintf(fout, ",%li", pi->getinst());
 pi = pi->getnext();
 fprintf(fout, "\n");

void writeoutpages(redblacknode<pagechain>* node, FILE* fout)
 if (node == NULL)
 writeoutpages(node->left, fout);
 writechain(&node->getvalue(), fout);
 writeoutpages(node->right, fout);

extern "C" {

void* createtree(void)
 redblacktree<redblacknode<pagechain> >* tree;
 tree = new redblacktree<redblacknode<pagechain> >();
 return static_cast<void*>(tree);

void removetree(void* tree)
 redblacktree<redblacknode >* rbtree;
 rbtree = (static_cast<redblacktree<redblacknode<pagechain> >* >(tree));
 delete rbtree;
 tree = NULL;

void* getroot(void* tree)
 redblacktree<redblacknode<pagechain> >* nodetree =
 static_cast<redblacktree<redblacknode<pagechain> >*>(tree);
 return static_cast<void*>(nodetree->root);

//if node for this page exists add to its tail
//if node for this page does not exist, create and add to tail
void insertinstruction(long pagenumber, long instruction,
 void* tree, void* root)
 redblacknode<pagechain> *rootnode, *pagenode, *dummynode;
 redblacktree<redblacknode<pagechain> >* nodetree;

 pagechain dummychain = pagechain(pagenumber);
 rootnode = static_cast<redblacknode<pagechain>*>(root);
 nodetree = static_cast<redblacktree<redblacknode<pagechain> >* >(tree);

dummynode = new redblacknode<pagechain>(dummychain);
 pagenode = nodetree->locatenode(dummynode, rootnode);
 pageinst* nextinstruction = new pageinst(instruction);

if (pagenode) {
 delete dummynode;
 } else {
 nodetree->insertnode(dummynode, rootnode);

void writeinorder(void* tree, FILE* fileout)
 redblacktree<redblacknode<pagechain> >* nodetree =
 static_cast<redblacktree<redblacknode<pagechain> >*>

 writeoutpages(nodetree->root, fileout);

} //end extern "C"

The old code used Groovy’s Map class to index a sparse array, the new code uses a red-black tree to index the same data. The speed up is not just because of the tree’s superior performance, but it surely has a lot to do with it.

I now have to decide whether to re-write the whole of the testing code in C/C++ … I think I am going to have to.

Enhanced by Zemanta

How slow is a fast computer?

Valgrind (Photo credit: Wikipedia)

I am working on a simulation environment for a NoC. The idea is that we can take a memory reference string – generated by Valgrind – and then test how long it will take to execute with different memory models for the NoC. It’s at an early stage and still quite crude.

The data set it is working with is of the order of 200GB, though that covers 18 threads of execution and so, very roughly speaking, it is 18 data sets of just over 10GB each. I have written some parallel Groovy/Java code to handle it and the code seems to work, though there is a lot of work to be done.

I am running it on the University of York’s compute server – a beast with 32 cores and a lot of memory. But it is slow, slow, slow. My current estimate is that it would take about 10 weeks to crunch a whole dataset. The code is slow because we have to synchronise the threads to model the inherent parallelism of the NoC. The whole thing is a demonstration – with a vengeance – of Amdahl’s Law.

Even in as long a project as a PhD I don’t have 10 weeks per dataset going free, so this is a problem!

New, improved Hexxed

I have not had much luck in hunting down what is wrong with my code or the Xerces-c SAX2 parser – but I do think I have successfully updated by hex editor, Hexxed, to handle 64 bit (ie >4GB) files.

Indeed it performs rather better than vi for some editing tasks (Hexxed has a vi like interface).

So, if a hex editor, capable of handling little and big endian code and able to display output in Unicode is what you are after, and if you are vi-conditioned, then maybe Hexxed is your thing.

Groovy code can be found at: https://github.com/mcmenaminadrian/hexxed

While a runnable jar for those of you who have Java but are not yet Groovy can be downloaded at:

And there is more about it here: https://cartesianproduct.wordpress.com/2012/06/03/hexxed-usage-options/

Just remember it is code for playing with – don’t bet the farm on it. But, that said, I have no reason to think it does not work.

Free software hex editor

I have noticed that my free software hex editor (hexxed) – which is licensed under the GNU GPL – does not really come up in any searches, so here’s another entry to boost it.

It’s a bit crude, but it does some things well (e.g., display unicode and switch endianness) and it will run anywhere you can get Java to work. And as it is free software anyone is free to make it better.

This page tells you all about how to get it and use it.

Missing coding

English: Programmer
English: Programmer (Photo credit: Wikipedia)

Ever been engaged in an intellectual activity where the hours whizz by much faster than you think, as you puzzle over and round the issues while feeling an intense pleasure that makes the rest of the world seem less important?  This what is called “flow” and, generally, it is what I feel when I am coding.

I am not the world’s greatest coder, to be honest I am little better than average (though doing the MSc at Birkbeck made me so much better than I used to be). The pleasure doesn’t come from having a natural skill that means I can write hundreds of lines at a single sitting: like a typical programmer, if I got 20 fully debugged lines out a day, every day, I would count that as decent performance.

But lately I haven’t done any coding at all (apart from a few lines of scripting in the office to ensure SMB mounts are automatic and such like). Instead I have read a lot of computer science papers and spent a lot of time working on a presentation I need to make and a literature review that will come after.

But I miss the coding, and I am missing it more every day.

Now, coding is also very more-ish. If you code to scratch an itch then chances are you make yourself itchy by coding. So earlier this year I wrote a Groovy/Java hex editor – Hexxed – after I wrote a Linux filesystem where I could not find a hex editor that did what I wanted to do, and so on.

So, even as I puzzle about whether I should write some code just for the sake of a mental stretch, I also wonder what I would write.

The coming HTML5 disaster

HTML5 official logo (official since 1 April 20...
HTML5 official logo (official since 1 April 2011, (Photo credit: Wikipedia)

About 18 months ago I got my first Android phone. One of the first applications I downloaded on it was for Facebook. It had some quirks but it worked fine.

Not long after I was prompted to ‘upgrade’ to the next version, which I duly did.

The supposed upgrade was (and is) a disaster. Slow, difficult to understand, a mess.

I had always wondered why Facebook had not simply rolled back the upgrade and tried again. But now I know. To cut their costs they had based their iOS and Android applications on a common HTML5 core. A common code base eliminated the need to maintain two separate blocks of complex code, presumably with two sets of developers.

But it didn’t work. By all accounts the iOS version made the Android one look slick and this week it was axed in favour of an Objective C based application. Hopefully a Java based Android replacement is also in the works.

But I suspect sloth will be the least of HTML5’s problems. Turning mark up into executable code just sounds like a recipe for trouble and it’s only just started.

Running BASIC on the Raspberry Pi

Actually, I ran BINSIC, my very own dialect of BASIC on the Raspberry Pi – it is very slow (a bit slower even than a ZX81 back in the day) but it does work.Raspberry Pi running

Haven’t had a chance to investigate what happens if I tweak the settings on the thing – possibly I might be able to speed execution up. Could be that Java and Groovy is just too much bloat, could be that BINSIC just demands a lot of computation (I refuse to consider that it might be poorly designed and executed).

Some fixes for BINSIC

I have made a few small, but important, fixes to BINSIC, the reimplementation of BASIC I have built using Groovy.

You can download the jar file (which can be run in any standard Java environment) from here:

Here’s another BASIC “game” (it’s amazing that this sort of thing used to fascinate those of us with these machines), for you to try – it was fixing this up that helped me find the bugs:

10 REM **DICE GAME**SLR/1983**
20 LET A=0
30 LET B=0
70 GOSUB 160
90 GOSUB 220
130 LET X$ = INKEY$
132 IF X$ = "" THEN GOTO 130
136 CLS
140 IF X$ = "S" THEN STOP
150 GOTO 10
160 FOR G=1 TO 2
170 LET Z=INT (RND*6+1)
180 LET A=A+Z
190 GOSUB 280
200 NEXT G
220 FOR G=1 TO 2
230 LET Z=INT (RND*6+1)
240 LET B=B+Z
250 GOSUB 280
260 NEXT G
280 REM Draw
282 PAUSE 100
285 IF Z = 1 THEN GOSUB 500
290 IF Z = 2 THEN GOSUB 600
300 IF Z = 3 THEN GOSUB 700
310 IF Z = 4 THEN GOSUB 800
320 IF Z = 5 THEN GOSUB 900
330 IF Z = 6 THEN GOSUB 1000
400 PRINT "[ ][ ][ ]"
420 PRINT "[*][ ][*]"
440 PRINT "[ ][*][ ]"
460 PRINT "[*][ ][ ]"
480 PRINT "[ ][ ][*]"
500 REM 1
510 GOSUB 400
520 GOSUB 440
530 GOSUB 400
600 REM 2
610 GOSUB 460
620 GOSUB 400
630 GOSUB 480
700 REM 3
710 GOSUB 460
720 GOSUB 440
730 GOSUB 480
800 REM 4
810 GOSUB 420
820 GOSUB 400
830 GOSUB 420
900 REM 5
910 GOSUB 420
920 GOSUB 440
930 GOSUB 420
1000 REM 6
1010 GOSUB 420
1020 GOSUB 420
1030 GOSUB 420

The original game can be found here: http://zx81.reids4fun.com/zx81/dice/dice_list.html – I had to change it to cope with the main weakness of BINSIC – that GOTOs to lines inside loops fail.

A problem with Life

Bill Gosper's Glider Gun in action—a variation...
Bill Gosper’s Glider Gun in action—a variation of Conway’s Game of Life. This image was made by using Life32 v2.15 beta, by Johan G. Bontes. (Photo credit: Wikipedia)

I had hoped to “launch” BINSIC – Binsic Is Not Sinclair Instruction Code – my BASIC-as-a-DSL project built using Groovy, this weekend. For the launch I wanted to publish a jar file (so usable by everyone with Java) that ran the version of Conway’s Game of Life (seemed very appropriate for both general – Life being the ultimate hacker meme – and personal – I once wrote a version of Life in Z80 machine code for the ZX80 – reasons) found in Basic Computer Games – but I don’t think I am going to manage it now

The problem is that the code in the book is totally banjaxxed. It uses variables before they are declared and in general looks as though either some lines have been transposed or some code has been omitted altogether. It is certainly plain that the printout of the running program in the book does not reflect the code found on its pages. In any case hacking at this code reveals the full horror of BASIC and how difficult it is to maintain code that is not even in blocks, never mind any other sort of order.

So I have two choices – refactor the BASIC I have in quite a big way to get it to run, or find some new code instead.

But the exercise has not been completely wasted. While hunting down the bugs in David H Ahl’s code I have found more than a few in BINSIC itself.