## Third time lucky?

Last time we met, my PhD supervisor told me to expect to spend a long time making things that didn’t work: it

certainly feels like that right now.

My current task is to build a logical model of a working memory allocation scheme for a NoC.

I started with some Groovy, then realised that was going nowhere – how could I test these Groovy classes? I could write a DSL, but that felt like I’d be putting all the effort into the wrong thing.

My next thought was – write some new system calls for an experimental Linux kernel. Well, that has proved to be a pain – writing system calls is a bit of a faff (and nowhere does it seem to be fully documented – for a current kernel as opposed to a 2.2 one! – presumably because nobody should really be writing new Linux system calls anyway and so its knowledge best confined to the high priests of the cult) and testing it is proving to be even more difficult: it’s inside a VM or nothing.

Then I thought this afternoon – why bother with the kernel anyway – if I wrote a userland replacement for malloc that allocated from a fixed pool that should work just as well – so that is what I am about to try.

## Virtual memory and a new operating system

This is going to be one of those blog posts where I attempt to clarify my thoughts by writing them down … which also means I might change my mind as I go along.

My problem is this: I have elected to, as part of my PhD, explore the prospect of building a virtual memory system for Network-on-Chip (NoC) computers. NoCs have multiple processors – perhaps 16, or 64 or (in the near future) many more, all on one piece of silicon and all connected by a packet-switched network. The numbers are important – because having that many processors (certainly a number greater than 16) means that the, so far more typical, bus-based interconnects do not work and that also means that the different processors cannot easily be told which other processor is trying to access the same slither of off-chip memory that they are after.

As a result, instead of increasing computing speed by seeing more processors crunch a problem in parallel, the danger is that computing efficiency falls off because either (A) each processor is confined to a very small patch of memory to ensure it does not interfere with other processors’ memory accesses, or (B) some very complicated and expensive (in time) logic is applied to ensure that each processor does know what accesses are being made, or (C) some combination of the above e.g., a private area which the processor can access freely and a shared area where some logic in software polices accesses.

None are perfect – (A) could limit processor numbers, (B) could be slow while (C) could be slow and also not work so well, so limiting processor numbers. So (C) is the worst of both worlds? Well, (C) is also, sort-of, my area of exploration!

Other researchers have already built a virtual memory system for another NoC, the Intel 48 core SCC. I don’t want to just repeat their work (I doubt that would impress my examiners either) in any case, so here are, roughly my thoughts:

• There is a choice between a page-based VM and one that manages objects. As an experimental idea the choice of managing objects quite appeals – but it also seems difficult to have a system that was efficient and managed objects without that being on top of some sort of page-based system.
• What is the priority for a VMM? To provide a shared space for the operating system and its code (too easy?), or to deliver memory to applications? Should this then be a virtual machine layer underneath the operating system? (This is what the SCC folk did – RockyVisor).
• Given that message passing seems a better fit for NoCs than shared memory in any case – how should message passing systems integrate with a VMM? Should we go down the route advocated by the builders of the Barrelfish operating system and absolutely rule out shared memory as a basis of processor interco-operation – just using the VMM as a means of allocating memory rather than anything else? (I think, yes, probably)
• But if the answer to the above is ‘yes’ are we sacrificing efficiency for anti-shared memory dogma? I worry we may be.

Any thoughts would be very welcome.

(I found a good – and reasonably priced – book that describes a working paging system along the way – What Makes It Page?: The Windows 7 (x64) Virtual Memory Manager).

## Graph cartesian products

On the basis that knowing some more about Graph Theory won’t do me any harm when thinking about operating system behaviour, I am reading about that too right now.

But I found the book’s explanation of a Graph Cartesian Product rather less than full, so here is my attempt to make it a bit clearer.

Say we have graph $G_1$ with vertices $(v_1, v_2, v_3)$ and graph $G_2$ with vertices $(w_1, w_2, w_3, w_4)$, then our cartesian product graph is $G_3$ with vertices $((v_1, w_1), (v_1, w_2), (v1, w_3), (v_2, w_4),(v_2, w_1), (v_2, w_2), (v2, w_3), (v_2, w_4),(v_3, w_1), (v_3, w_2), (v3, w_3), (v_3, w_4))$

Which vertices in this new graph are adjacent?

The vertices are adjacent if – and only if – taking the new vertices to be of the form $(v_n, w_n) (v^{\prime}_n, w^{\prime}_n)$ – if $v_n = v^{\prime}_n$ and $w_n$ is adjacent to $w^{\prime}_n$ in $G_2$ OR $w_n = w^{\prime}_n$ and $v_n$ is adjacent to $v^{\prime}_n$ in $G_1$.

## A GUI for Metapost?

I have sort-of abandoned my Apple Air Book for serious work this last week – going back to a 2008/9 Toshiba laptop (another Morgan Computers purchase) running Linux.

The Apple is a lovely device to travel with and is beautiful, if extremely expensive, device with which to browse the web, but a decade of conditioning to Linux and its command-line power and orthogonal tool set means I am much happier even with a slower machine when it comes to doing things like drawing figures with Metapost.

But having extolled the power of the command line I am wondering whether I should build a GUI for Metapost – essentially an editor panel coupled with a EPS display panel.

Metapost users seem thin on the ground – though maybe that is because a GUI tool doesn’t exist – but anyone who does use it care to comment?

## The problem with Apple kit (part one?)

Last September I joined a startup, Centreground Political Communications and, like my three fellow employees, have been using Apple equipment more or less since then.

It is good quality kit, focused on (typical) user experience: like Windows done right. And, yes, as a Unix/Linux person I also get a bash shell and access to forty years of engineering excellence.

But it is difficult not to see the faults also:

• The one button mouse – a legacy of the 1980s and one that Apple must recognise is a poor one (why else have all those Ctrl dependent commands?
• The emphasis of design over function – no ethernet port, and a wireless performance massively below even the least powerful IBM compatible
• Audio that works out of the box – if you can ever hear it

## Working on filesystem code

Nearly a decade ago I wrote a crude, but working, filesystem for the Sega Dreamcast VMU on Linux. I then put ported a very simple web server to the Dreamcast and got the whole thing on Slashdot.

I never managed to get the thing into mainline – indeed the battering I got last time I tried, in 2009, more or less made me give up writing anything for the kernel and the Dreamcast was put away.

I am not pretending my code was pretty or even particularly good but it is no wonder people get put off from contributing when you get pig ignorant comments like these:

Everything about the on-disk format is tied to the VMU. Until that sinks in, don’t bother sending me email, thanks.

This was someone, who ought to have known better, claiming that it was not possible to have a free standing filesystem for this at all – though they were making their, incorrect, claim in the manner seen all too frequently on the Linux Kernel Mailing List.

No comments.  Really.  There must be some limits on the language one is willing to use on public maillist, after all.

As you can tell this person – a top flight Linux hacker – did not like my code. And, looking back, I can hardly blame him, it was pretty ugly. But as a help to fix it this is of next to no use – and only serves to demotivate. Nasty computer scientists, again.

Ok, so I have got that off my chest. And I am once more placing myself in the firing line.

The filesystem code, a work in progress (so check where the code has got to once you click the link), can be found here. A filesystem that you should be able to mount under loopback, can be found here. All testers welcomed with open arms.

## The cost of soft/minor faults

Here is a complete graph of the memory use and fault count for a run of ls -lia.

As you can see there are only soft/minor faults here – as one would expect (this was a machine with a lot of memory), as the C library provides the ls function and it will be loaded in memory (presumably the filesystem information was also in memory).

But there are a lot of soft faults – and these too have a cost, even if nothing like the cost of a hard fault. For a start each soft page fault almost certainly indicates a miss in the processor cache.

The paper linked here also gives a lot of information about Linux’s generation of soft/minor faults and their performance impact – it seems that the kernel is designed to deliver system wide flexibility at the expense of performance.