Your assembly is my domain-specific language


A few years ago I set out to write a clone of Sinclair BASIC as a domain-specific language (DSL) in Groovy. The end result was BINSIC, but it was much closer to an interpreter than a DSL: it turned out that strange ways that Groovy handled capitalised function/closure names meant that I could not match keywords at all and so interpretation became the only option (though there were DSL-like features lurking under the covers).

Now I have set out to write an interpreter for RISCV assembly and have written something which is much more like a DSL. My aim was to track the memory reference patterns of various real-time benchmarks running on RISCV systems – based on the disassembly generated by Spike, the RISCV emulator.

Getting that done requires tracking register state – because in true RISC fashion reads and writes are done not to immediates, but to addresses held in registers with immediates used as offsets.

To make this happen every (or at least every used) RISCV instruction is mapped to a closure and the operands treated as closure parameters. The closures are all inside the RegisterFile class so all can access the registers to keep the state updated. This makes it quite like a DSL but I make no claims for purity: every statement is passed through a regular expression based interpreter to separate out the parameters: if nothing else that eliminates a lot of boilerplate code that would have to be stuck inside the closures.

Memory is treated as a sparse array through a Groovy Map instance – with an address (64-bit Long) mapped to an 8-bit Byte.

The process isn’t perfect – a disassembly doesn’t given you the contents of initialised data, so an attempt to access those addresses is quite likely to raise an exception which needs to be trapped and the memory accounted for – by simply assigning zero to the address: it’s a kludge but it seems to work.

C, C, glorious C


gdb icon, created for the Open Icon Library
gdb icon, created for the Open Icon Library (Photo credit: Wikipedia)

This blog – The Unreasonable Effectiveness of C – is really very good, and makes a lot of points that are very difficult to argue with.

Though, lately, I have (re-)discovered the joys of C++ – even if I do write code in that language like a C programmer.

In the last six months I have written a lot of code designed to process a lot of data. I started off with Groovy – my data was XML and Groovy comes with tools that are specifically designed to make a programmer’s life a lot easier when handling XML.

But, in the end, Groovy just was not up to the task – because for my needs: processing 200 GB of XML, Groovy (or, rather, the Java VM on which it runs) is just not fast or flexible enough.

My response to that was to go “route 1” and reach for C. I know and knew I could build what I wanted in C and that it would be as fast as I could reasonably hope to get it (going down to Assembly was just not a serious option).

However, the code I am actually using is in C++. I found that the abstractions offered by the STL were worth the price in terms of speed of execution and code that is somewhat opaque to the GDB debugger.

It’s a compromise, of course, but I suspect if I had tried to write the equivalent of the STL map and set classes in C, I’d still be debugging my implementation many weeks later – after all I found that my C++ red-black tree code was actually broken despite using it (successfully) for a number of years.

Real programmers have to make these compromises – and C++ struck the right note between language expressiveness and language power for me.

Enhanced by Zemanta

Plunging deeper


English: Example red-black tree
English: Example red-black tree (Photo credit: Wikipedia)

For the last month I have been working hard on some C/C++ code to simulate a 16 core computer.

I had already got some code that did this – written in Groovy – but the limitations of the Java VM made it just too difficult to write efficient code to do what I really wanted – which was to simulate the performance of such a 16 core device using Belady’s OPT (optimal – or so-called “clairvoyant”) page replacement policy. The groovy code was going to take two years just to generate the memory reference strings – never mind how quickly it might “execute” the code.

Using C++ (with a C interface as I am just a little bit more comfortable in that language) and some code I wrote when doing my MSc to generate a red-black tree (a semi-balanced binary tree) I was able to build some code that generates the reference strings in a few hours: it’s enough to restore your faith in traditional compiled, natively-executing, code.

But building the simulator to handle the reference strings is another matter. Last weekend, up in York, I was able to write many hundreds of lines and make quite a bit of progress, but nothing was really working.

A week later, and some crash self-tutoring in using the GDB debugger I now have got somewhere, but now have to confront a deeper problem – my red-black tree code is fundamentally broken somewhere.

I wrote this code in the summer of 2010 because I had had such a bad time of it on the C++ course at Birkbeck (my own fault, I didn’t pay enough attention). For the close-to-four-years I have looked on it as something of a coding pinnacle: and certainly it does quite efficiently generate a templatised red-black tree if you are just sticking new values in one after another. But it would appear (I am not quite sure if this is correct – but the evidence points that way) to be quite a bit broken if you subsequently start deleting some nodes in the tree and adding new nodes in: the crashes I am seeing suggest my code creates a loop – the left and right branches pointing back to the parent.

So, the work must go on and I will have to pore over some four year old code (not brilliantly documented) to fix it.

Enhanced by Zemanta

Wanted: a better C++ book


I first taught myself C++ back in 1993 – using Borland’s Turbo C++: a great product I had lots of fun with.

Photo of Bjarne Stroustrup, creator of the pro...
Photo of Bjarne Stroustrup, creator of the programming language C++. (Photo credit: Wikipedia)

After that I moved on Microsoft’s Visual C++ (it was their unilateral cancellation of my subscription that marked a key stage in my disillusionment with Redmond).

In those days C++ was simple – people talked about templates and namespaces but nobody actually used them.

So, when in 2009/10 when I was being taught C++ at Birkbeck I didn’t really pay enough attention – I thought I knew the language but I didn’t.

After that course was over I made a real effort to teach myself C++ properly and wrote some not too bad code. But then Groovy came along and nobody was much interested in my VMUFAT file driver for the Linux kernel and so both C++ and C got neglected.

Now C is like bike riding – if you haven’t done it for a while you are a bit unsteady at first but soon get the hang of it. But C++ is different and now I back to writing in both idioms I miss having a good C++ guide book.

I do have C++: The Complete Reference (4th Edition) but, in my view, this is one of the worst computer books ever written – as it simply reads like a book about C with some bolt-ons.

I also have Computing Concepts with C++ Essentials: but it’s a book about how to program, not a good reference.

What I want is something that tells me how to do things and use things without either drowning me in formal references or treating me like a newcomer.

What is the best option – is The C++ Programming Language still the best? I used to have a copy of that from 20 years ago (perhaps I still have it somewhere in the house) and it was quite good but obviously that edition has long since been superseded.

Any recommendations gratefully received in the comments.

Enhanced by Zemanta

From two years to two hundred minutes


I rewrote my parsing code in C/C++ and a task that was likely to take two years in Groovy/Java while running 18 threads – on a multicore machine with a large amount of memory – (I am not exaggerating about the turnaround time, either) completed in not much more than three hours in a single thread on the same machine.

(The new code has a C main – but the bulk of the code is in C++ which is accessed through an extern “C” interface.)


#include <iostream>
#include <cstdio>
#include "redblack.hpp"

using namespace std;

//Copyright Adrian McMenamin 2014
//acm538@york.ac.uk
//C wrap around C++ redblack tree code
//Licensed under GNU GPL verison 2
//or any later version at your discretion

class pageinst
{
 private:
 long instruction;
 pageinst* next;

public:
 pageinst(long inst);
 long getinst(void);
 pageinst* getnext(void);
 void setnext(pageinst* newinst);
};

pageinst::pageinst(long inst)
{
 instruction = inst;
 next = NULL;
}

long pageinst::getinst(void)
{
 return instruction;
}

pageinst* pageinst::getnext(void)
{
 return next;
}

void pageinst::setnext(pageinst* newinst)
{
 next = newinst;
}

class pagechain
{
 friend ostream& operator<<(ostream& os, pagechain& pc);

private:
 long page;
 pageinst* head;
 pageinst* tail;

public:
 pagechain(long where);
 long getpage(void);
 bool operator==(pagechain&) const;
 bool operator<(pagechain&) const;
 pageinst* gethead(void);
 void sethead(pageinst* newhead);
 pageinst* gettail(void);
 void settail(pageinst* newtail);
};

ostream& operator<<(ostream& os, pagechain& pc)
{
 os << pc.page;
 return os;
}

long pagechain::getpage(void)
{
 return page;
}

bool pagechain::operator==(pagechain& pc) const
{
 return (page == pc.page);
}

bool pagechain::operator<(pagechain& pc) const
{
 return (page < pc.page);
}

pagechain::pagechain(long where)
{
 page = where;
 head = NULL;
 tail = NULL;
}

pageinst* pagechain::gethead(void)
{
 return head;
}

void pagechain::sethead(pageinst* newhead)
{
 pageinst* oldhead = gethead();
 head = newhead;
 if (oldhead)
 newhead->setnext(oldhead);
 if (!gettail())
 tail = newhead;
}

pageinst* pagechain::gettail(void)
{
 return tail;
}

void pagechain::settail(pageinst* newtail)
{
 pageinst* oldtail = gettail();
 tail = newtail;
 if (oldtail)
 oldtail->setnext(newtail);
 if (!gethead())
 head = newtail;
}

void killchain(pageinst* pi)
{
 if (pi == NULL)
 return;
 pageinst* next = pi->getnext();
 delete pi;
 killchain(next);
}

void killtree(redblacknode<pagechain>* node)
{
 if (node == NULL)
 return;
 killtree(node->left);
 killtree(node->right);
 killchain(node->getvalue().gethead());
}

void writechain(pagechain* pc, FILE* fout)
{
 if (pc == NULL)
 return;
 fprintf(fout, "PAGE: %li ", pc->getpage());
 pageinst* pi = pc->gethead();
 while (pi) {
 fprintf(fout, ",%li", pi->getinst());
 pi = pi->getnext();
 }
 fprintf(fout, "\n");
}

void writeoutpages(redblacknode<pagechain>* node, FILE* fout)
{
 if (node == NULL)
 return;
 writeoutpages(node->left, fout);
 writechain(&node->getvalue(), fout);
 writeoutpages(node->right, fout);
}

extern "C" {

void* createtree(void)
{
 redblacktree<redblacknode<pagechain> >* tree;
 tree = new redblacktree<redblacknode<pagechain> >();
 return static_cast<void*>(tree);
}

void removetree(void* tree)
{
 redblacktree<redblacknode >* rbtree;
 rbtree = (static_cast<redblacktree<redblacknode<pagechain> >* >(tree));
 killtree(rbtree->root);
 delete rbtree;
 tree = NULL;
}

void* getroot(void* tree)
{
 redblacktree<redblacknode<pagechain> >* nodetree =
 static_cast<redblacktree<redblacknode<pagechain> >*>(tree);
 return static_cast<void*>(nodetree->root);
}

//if node for this page exists add to its tail
//if node for this page does not exist, create and add to tail
void insertinstruction(long pagenumber, long instruction,
 void* tree, void* root)
{
 redblacknode<pagechain> *rootnode, *pagenode, *dummynode;
 redblacktree<redblacknode<pagechain> >* nodetree;

 pagechain dummychain = pagechain(pagenumber);
 rootnode = static_cast<redblacknode<pagechain>*>(root);
 nodetree = static_cast<redblacktree<redblacknode<pagechain> >* >(tree);

dummynode = new redblacknode<pagechain>(dummychain);
 pagenode = nodetree->locatenode(dummynode, rootnode);
 pageinst* nextinstruction = new pageinst(instruction);

if (pagenode) {
 pagenode->getvalue().settail(nextinstruction);
 delete dummynode;
 } else {
 dummynode->getvalue().settail(nextinstruction);
 nodetree->insertnode(dummynode, rootnode);
 }
}

void writeinorder(void* tree, FILE* fileout)
{
 redblacktree<redblacknode<pagechain> >* nodetree =
 static_cast<redblacktree<redblacknode<pagechain> >*>
 (tree);

 writeoutpages(nodetree->root, fileout);
}

} //end extern "C"

The old code used Groovy’s Map class to index a sparse array, the new code uses a red-black tree to index the same data. The speed up is not just because of the tree’s superior performance, but it surely has a lot to do with it.

I now have to decide whether to re-write the whole of the testing code in C/C++ … I think I am going to have to.

Enhanced by Zemanta

The joy of .CXX


I have a problem.

The rise of the C guru. (LOL =])
The rise of the C guru. (LOL =]) (Photo credit: Dzhus)

I need to parse 220GB of XML to find an OPT reference string (i.e. to profile the memory references of a running application to create a page replacement algorithm that will know which is the most efficient page replacement decision.)

I had one algorithm in mind for this and I wrote the Groovy code but it kept exhausting the memory pool of the VM (as it generated a lot of anonymous memory). Even whacking the VM up in size quite considerably didn’t seem to fix the problem.

So I used another algorithm – and it works. After a fashion. Because even with 18 threads running in parallel on the University of York’s big iron compute server I think it might take around two years to complete. And I don’t have two years.

So I need to find something better – something where I have much more control over the memory allocation code and where I can also be sure of performance.

C is the obvious answer. Or, in this case, C with an interface to some C++ code I wrote a few years ago to build a red-black tree.

The strange thing is that, despite Groovy’s claim that it removes the really tedious bits of code writing by doing away with boiler plate, this weekend I have found writing all that getter/setter code quite therapeutic: perhaps this is a programmer’s comfort blanket?

Enhanced by Zemanta

How slow is a fast computer?


Valgrind
Valgrind (Photo credit: Wikipedia)

I am working on a simulation environment for a NoC. The idea is that we can take a memory reference string – generated by Valgrind – and then test how long it will take to execute with different memory models for the NoC. It’s at an early stage and still quite crude.

The data set it is working with is of the order of 200GB, though that covers 18 threads of execution and so, very roughly speaking, it is 18 data sets of just over 10GB each. I have written some parallel Groovy/Java code to handle it and the code seems to work, though there is a lot of work to be done.

I am running it on the University of York’s compute server – a beast with 32 cores and a lot of memory. But it is slow, slow, slow. My current estimate is that it would take about 10 weeks to crunch a whole dataset. The code is slow because we have to synchronise the threads to model the inherent parallelism of the NoC. The whole thing is a demonstration – with a vengeance – of Amdahl’s Law.

Even in as long a project as a PhD I don’t have 10 weeks per dataset going free, so this is a problem!