C, C, glorious C

gdb icon, created for the Open Icon Library
gdb icon, created for the Open Icon Library (Photo credit: Wikipedia)

This blog – The Unreasonable Effectiveness of C – is really very good, and makes a lot of points that are very difficult to argue with.

Though, lately, I have (re-)discovered the joys of C++ – even if I do write code in that language like a C programmer.

In the last six months I have written a lot of code designed to process a lot of data. I started off with Groovy – my data was XML and Groovy comes with tools that are specifically designed to make a programmer’s life a lot easier when handling XML.

But, in the end, Groovy just was not up to the task – because for my needs: processing 200 GB of XML, Groovy (or, rather, the Java VM on which it runs) is just not fast or flexible enough.

My response to that was to go “route 1” and reach for C. I know and knew I could build what I wanted in C and that it would be as fast as I could reasonably hope to get it (going down to Assembly was just not a serious option).

However, the code I am actually using is in C++. I found that the abstractions offered by the STL were worth the price in terms of speed of execution and code that is somewhat opaque to the GDB debugger.

It’s a compromise, of course, but I suspect if I had tried to write the equivalent of the STL map and set classes in C, I’d still be debugging my implementation many weeks later – after all I found that my C++ red-black tree code was actually broken despite using it (successfully) for a number of years.

Real programmers have to make these compromises – and C++ struck the right note between language expressiveness and language power for me.

Enhanced by Zemanta

From two years to two hundred minutes

I rewrote my parsing code in C/C++ and a task that was likely to take two years in Groovy/Java while running 18 threads – on a multicore machine with a large amount of memory – (I am not exaggerating about the turnaround time, either) completed in not much more than three hours in a single thread on the same machine.

(The new code has a C main – but the bulk of the code is in C++ which is accessed through an extern “C” interface.)

#include <iostream>
#include <cstdio>
#include "redblack.hpp"

using namespace std;

//Copyright Adrian McMenamin 2014
//C wrap around C++ redblack tree code
//Licensed under GNU GPL verison 2
//or any later version at your discretion

class pageinst
 long instruction;
 pageinst* next;

 pageinst(long inst);
 long getinst(void);
 pageinst* getnext(void);
 void setnext(pageinst* newinst);

pageinst::pageinst(long inst)
 instruction = inst;
 next = NULL;

long pageinst::getinst(void)
 return instruction;

pageinst* pageinst::getnext(void)
 return next;

void pageinst::setnext(pageinst* newinst)
 next = newinst;

class pagechain
 friend ostream& operator<<(ostream& os, pagechain& pc);

 long page;
 pageinst* head;
 pageinst* tail;

 pagechain(long where);
 long getpage(void);
 bool operator==(pagechain&) const;
 bool operator<(pagechain&) const;
 pageinst* gethead(void);
 void sethead(pageinst* newhead);
 pageinst* gettail(void);
 void settail(pageinst* newtail);

ostream& operator<<(ostream& os, pagechain& pc)
 os << pc.page;
 return os;

long pagechain::getpage(void)
 return page;

bool pagechain::operator==(pagechain& pc) const
 return (page == pc.page);

bool pagechain::operator<(pagechain& pc) const
 return (page < pc.page);

pagechain::pagechain(long where)
 page = where;
 head = NULL;
 tail = NULL;

pageinst* pagechain::gethead(void)
 return head;

void pagechain::sethead(pageinst* newhead)
 pageinst* oldhead = gethead();
 head = newhead;
 if (oldhead)
 if (!gettail())
 tail = newhead;

pageinst* pagechain::gettail(void)
 return tail;

void pagechain::settail(pageinst* newtail)
 pageinst* oldtail = gettail();
 tail = newtail;
 if (oldtail)
 if (!gethead())
 head = newtail;

void killchain(pageinst* pi)
 if (pi == NULL)
 pageinst* next = pi->getnext();
 delete pi;

void killtree(redblacknode<pagechain>* node)
 if (node == NULL)

void writechain(pagechain* pc, FILE* fout)
 if (pc == NULL)
 fprintf(fout, "PAGE: %li ", pc->getpage());
 pageinst* pi = pc->gethead();
 while (pi) {
 fprintf(fout, ",%li", pi->getinst());
 pi = pi->getnext();
 fprintf(fout, "\n");

void writeoutpages(redblacknode<pagechain>* node, FILE* fout)
 if (node == NULL)
 writeoutpages(node->left, fout);
 writechain(&node->getvalue(), fout);
 writeoutpages(node->right, fout);

extern "C" {

void* createtree(void)
 redblacktree<redblacknode<pagechain> >* tree;
 tree = new redblacktree<redblacknode<pagechain> >();
 return static_cast<void*>(tree);

void removetree(void* tree)
 redblacktree<redblacknode >* rbtree;
 rbtree = (static_cast<redblacktree<redblacknode<pagechain> >* >(tree));
 delete rbtree;
 tree = NULL;

void* getroot(void* tree)
 redblacktree<redblacknode<pagechain> >* nodetree =
 static_cast<redblacktree<redblacknode<pagechain> >*>(tree);
 return static_cast<void*>(nodetree->root);

//if node for this page exists add to its tail
//if node for this page does not exist, create and add to tail
void insertinstruction(long pagenumber, long instruction,
 void* tree, void* root)
 redblacknode<pagechain> *rootnode, *pagenode, *dummynode;
 redblacktree<redblacknode<pagechain> >* nodetree;

 pagechain dummychain = pagechain(pagenumber);
 rootnode = static_cast<redblacknode<pagechain>*>(root);
 nodetree = static_cast<redblacktree<redblacknode<pagechain> >* >(tree);

dummynode = new redblacknode<pagechain>(dummychain);
 pagenode = nodetree->locatenode(dummynode, rootnode);
 pageinst* nextinstruction = new pageinst(instruction);

if (pagenode) {
 delete dummynode;
 } else {
 nodetree->insertnode(dummynode, rootnode);

void writeinorder(void* tree, FILE* fileout)
 redblacktree<redblacknode<pagechain> >* nodetree =
 static_cast<redblacktree<redblacknode<pagechain> >*>

 writeoutpages(nodetree->root, fileout);

} //end extern "C"

The old code used Groovy’s Map class to index a sparse array, the new code uses a red-black tree to index the same data. The speed up is not just because of the tree’s superior performance, but it surely has a lot to do with it.

I now have to decide whether to re-write the whole of the testing code in C/C++ … I think I am going to have to.

Enhanced by Zemanta

How slow is a fast computer?

Valgrind (Photo credit: Wikipedia)

I am working on a simulation environment for a NoC. The idea is that we can take a memory reference string – generated by Valgrind – and then test how long it will take to execute with different memory models for the NoC. It’s at an early stage and still quite crude.

The data set it is working with is of the order of 200GB, though that covers 18 threads of execution and so, very roughly speaking, it is 18 data sets of just over 10GB each. I have written some parallel Groovy/Java code to handle it and the code seems to work, though there is a lot of work to be done.

I am running it on the University of York’s compute server – a beast with 32 cores and a lot of memory. But it is slow, slow, slow. My current estimate is that it would take about 10 weeks to crunch a whole dataset. The code is slow because we have to synchronise the threads to model the inherent parallelism of the NoC. The whole thing is a demonstration – with a vengeance – of Amdahl’s Law.

Even in as long a project as a PhD I don’t have 10 weeks per dataset going free, so this is a problem!

New, improved Hexxed

I have not had much luck in hunting down what is wrong with my code or the Xerces-c SAX2 parser – but I do think I have successfully updated by hex editor, Hexxed, to handle 64 bit (ie >4GB) files.

Indeed it performs rather better than vi for some editing tasks (Hexxed has a vi like interface).

So, if a hex editor, capable of handling little and big endian code and able to display output in Unicode is what you are after, and if you are vi-conditioned, then maybe Hexxed is your thing.

Groovy code can be found at: https://github.com/mcmenaminadrian/hexxed

While a runnable jar for those of you who have Java but are not yet Groovy can be downloaded at:

And there is more about it here: https://cartesianproduct.wordpress.com/2012/06/03/hexxed-usage-options/

Just remember it is code for playing with – don’t bet the farm on it. But, that said, I have no reason to think it does not work.

Free software hex editor

I have noticed that my free software hex editor (hexxed) – which is licensed under the GNU GPL – does not really come up in any searches, so here’s another entry to boost it.

It’s a bit crude, but it does some things well (e.g., display unicode and switch endianness) and it will run anywhere you can get Java to work. And as it is free software anyone is free to make it better.

This page tells you all about how to get it and use it.

Missing coding

English: Programmer
English: Programmer (Photo credit: Wikipedia)

Ever been engaged in an intellectual activity where the hours whizz by much faster than you think, as you puzzle over and round the issues while feeling an intense pleasure that makes the rest of the world seem less important?  This what is called “flow” and, generally, it is what I feel when I am coding.

I am not the world’s greatest coder, to be honest I am little better than average (though doing the MSc at Birkbeck made me so much better than I used to be). The pleasure doesn’t come from having a natural skill that means I can write hundreds of lines at a single sitting: like a typical programmer, if I got 20 fully debugged lines out a day, every day, I would count that as decent performance.

But lately I haven’t done any coding at all (apart from a few lines of scripting in the office to ensure SMB mounts are automatic and such like). Instead I have read a lot of computer science papers and spent a lot of time working on a presentation I need to make and a literature review that will come after.

But I miss the coding, and I am missing it more every day.

Now, coding is also very more-ish. If you code to scratch an itch then chances are you make yourself itchy by coding. So earlier this year I wrote a Groovy/Java hex editor – Hexxed – after I wrote a Linux filesystem where I could not find a hex editor that did what I wanted to do, and so on.

So, even as I puzzle about whether I should write some code just for the sake of a mental stretch, I also wonder what I would write.

The coming HTML5 disaster

HTML5 official logo (official since 1 April 20...
HTML5 official logo (official since 1 April 2011, (Photo credit: Wikipedia)

About 18 months ago I got my first Android phone. One of the first applications I downloaded on it was for Facebook. It had some quirks but it worked fine.

Not long after I was prompted to ‘upgrade’ to the next version, which I duly did.

The supposed upgrade was (and is) a disaster. Slow, difficult to understand, a mess.

I had always wondered why Facebook had not simply rolled back the upgrade and tried again. But now I know. To cut their costs they had based their iOS and Android applications on a common HTML5 core. A common code base eliminated the need to maintain two separate blocks of complex code, presumably with two sets of developers.

But it didn’t work. By all accounts the iOS version made the Android one look slick and this week it was axed in favour of an Objective C based application. Hopefully a Java based Android replacement is also in the works.

But I suspect sloth will be the least of HTML5’s problems. Turning mark up into executable code just sounds like a recipe for trouble and it’s only just started.