Third time lucky?

Last time we met, my PhD supervisor told me to expect to spend a long time making things that didn’t work: it

certainly feels like that right now.

My current task is to build a logical model of a working memory allocation scheme for a NoC.

I started with some Groovy, then realised that was going nowhere – how could I test these Groovy classes? I could write a DSL, but that felt like I’d be putting all the effort into the wrong thing.

My next thought was – write some new system calls for an experimental Linux kernel. Well, that has proved to be a pain – writing system calls is a bit of a faff (and nowhere does it seem to be fully documented – for a current kernel as opposed to a 2.2 one! – presumably because nobody should really be writing new Linux system calls anyway and so its knowledge best confined to the high priests of the cult) and testing it is proving to be even more difficult: it’s inside a VM or nothing.

Then I thought this afternoon – why bother with the kernel anyway – if I wrote a userland replacement for malloc that allocated from a fixed pool that should work just as well – so that is what I am about to try.

Virtual memory and a new operating system

This is going to be one of those blog posts where I attempt to clarify my thoughts by writing them down … which also means I might change my mind as I go along.

My problem is this: I have elected to, as part of my PhD, explore the prospect of building a virtual memory system for Network-on-Chip (NoC) computers. NoCs have multiple processors – perhaps 16, or 64 or (in the near future) many more, all on one piece of silicon and all connected by a packet-switched network. The numbers are important – because having that many processors (certainly a number greater than 16) means that the, so far more typical, bus-based interconnects do not work and that also means that the different processors cannot easily be told which other processor is trying to access the same slither of off-chip memory that they are after.

As a result, instead of increasing computing speed by seeing more processors crunch a problem in parallel, the danger is that computing efficiency falls off because either (A) each processor is confined to a very small patch of memory to ensure it does not interfere with other processors’ memory accesses, or (B) some very complicated and expensive (in time) logic is applied to ensure that each processor does know what accesses are being made, or (C) some combination of the above e.g., a private area which the processor can access freely and a shared area where some logic in software polices accesses.

None are perfect – (A) could limit processor numbers, (B) could be slow while (C) could be slow and also not work so well, so limiting processor numbers. So (C) is the worst of both worlds? Well, (C) is also, sort-of, my area of exploration!

Other researchers have already built a virtual memory system for another NoC, the Intel 48 core SCC. I don’t want to just repeat their work (I doubt that would impress my examiners either) in any case, so here are, roughly my thoughts:

• There is a choice between a page-based VM and one that manages objects. As an experimental idea the choice of managing objects quite appeals – but it also seems difficult to have a system that was efficient and managed objects without that being on top of some sort of page-based system.
• What is the priority for a VMM? To provide a shared space for the operating system and its code (too easy?), or to deliver memory to applications? Should this then be a virtual machine layer underneath the operating system? (This is what the SCC folk did – RockyVisor).
• Given that message passing seems a better fit for NoCs than shared memory in any case – how should message passing systems integrate with a VMM? Should we go down the route advocated by the builders of the Barrelfish operating system and absolutely rule out shared memory as a basis of processor interco-operation – just using the VMM as a means of allocating memory rather than anything else? (I think, yes, probably)
• But if the answer to the above is ‘yes’ are we sacrificing efficiency for anti-shared memory dogma? I worry we may be.

Any thoughts would be very welcome.

(I found a good – and reasonably priced – book that describes a working paging system along the way – What Makes It Page?: The Windows 7 (x64) Virtual Memory Manager).

Graph cartesian products

On the basis that knowing some more about Graph Theory won’t do me any harm when thinking about operating system behaviour, I am reading about that too right now.

But I found the book’s explanation of a Graph Cartesian Product rather less than full, so here is my attempt to make it a bit clearer.

Say we have graph $G_1$ with vertices $(v_1, v_2, v_3)$ and graph $G_2$ with vertices $(w_1, w_2, w_3, w_4)$, then our cartesian product graph is $G_3$ with vertices $((v_1, w_1), (v_1, w_2), (v1, w_3), (v_2, w_4),(v_2, w_1), (v_2, w_2), (v2, w_3), (v_2, w_4),(v_3, w_1), (v_3, w_2), (v3, w_3), (v_3, w_4))$

Which vertices in this new graph are adjacent?

The vertices are adjacent if – and only if – taking the new vertices to be of the form $(v_n, w_n) (v^{\prime}_n, w^{\prime}_n)$ – if $v_n = v^{\prime}_n$ and $w_n$ is adjacent to $w^{\prime}_n$ in $G_2$ OR $w_n = w^{\prime}_n$ and $v_n$ is adjacent to $v^{\prime}_n$ in $G_1$.

A GUI for Metapost?

I have sort-of abandoned my Apple Air Book for serious work this last week – going back to a 2008/9 Toshiba laptop (another Morgan Computers purchase) running Linux.

The Apple is a lovely device to travel with and is beautiful, if extremely expensive, device with which to browse the web, but a decade of conditioning to Linux and its command-line power and orthogonal tool set means I am much happier even with a slower machine when it comes to doing things like drawing figures with Metapost.

But having extolled the power of the command line I am wondering whether I should build a GUI for Metapost – essentially an editor panel coupled with a EPS display panel.

Metapost users seem thin on the ground – though maybe that is because a GUI tool doesn’t exist – but anyone who does use it care to comment?

The problem with Apple kit (part one?)

Last September I joined a startup, Centreground Political Communications and, like my three fellow employees, have been using Apple equipment more or less since then.

It is good quality kit, focused on (typical) user experience: like Windows done right. And, yes, as a Unix/Linux person I also get a bash shell and access to forty years of engineering excellence.

But it is difficult not to see the faults also:

• The one button mouse – a legacy of the 1980s and one that Apple must recognise is a poor one (why else have all those Ctrl dependent commands?
• The emphasis of design over function – no ethernet port, and a wireless performance massively below even the least powerful IBM compatible
• Audio that works out of the box – if you can ever hear it

Working on filesystem code

Nearly a decade ago I wrote a crude, but working, filesystem for the Sega Dreamcast VMU on Linux. I then put ported a very simple web server to the Dreamcast and got the whole thing on Slashdot.

I never managed to get the thing into mainline – indeed the battering I got last time I tried, in 2009, more or less made me give up writing anything for the kernel and the Dreamcast was put away.

I am not pretending my code was pretty or even particularly good but it is no wonder people get put off from contributing when you get pig ignorant comments like these:

Everything about the on-disk format is tied to the VMU. Until that sinks in, don’t bother sending me email, thanks.

This was someone, who ought to have known better, claiming that it was not possible to have a free standing filesystem for this at all – though they were making their, incorrect, claim in the manner seen all too frequently on the Linux Kernel Mailing List.

No comments.  Really.  There must be some limits on the language one is willing to use on public maillist, after all.

As you can tell this person – a top flight Linux hacker – did not like my code. And, looking back, I can hardly blame him, it was pretty ugly. But as a help to fix it this is of next to no use – and only serves to demotivate. Nasty computer scientists, again.

Ok, so I have got that off my chest. And I am once more placing myself in the firing line.

The filesystem code, a work in progress (so check where the code has got to once you click the link), can be found here. A filesystem that you should be able to mount under loopback, can be found here. All testers welcomed with open arms.

The cost of soft/minor faults

Here is a complete graph of the memory use and fault count for a run of ls -lia.

As you can see there are only soft/minor faults here – as one would expect (this was a machine with a lot of memory), as the C library provides the ls function and it will be loaded in memory (presumably the filesystem information was also in memory).

But there are a lot of soft faults – and these too have a cost, even if nothing like the cost of a hard fault. For a start each soft page fault almost certainly indicates a miss in the processor cache.

The paper linked here also gives a lot of information about Linux’s generation of soft/minor faults and their performance impact – it seems that the kernel is designed to deliver system wide flexibility at the expense of performance.

Counting soft and hard faults

When a running program references a page of memory that is not mapped into its address space the operating system throws a “page fault” – calling some kernel code to ensure that the page is loaded and mapped, or if the address referenced is not legal, that an appropriate error (a seg fault on x86) is signalled and the program’s execution stopped.

If the address is ‘legal’ then two types of fault exist – a ‘hard’ fault where the missing memory (eg some code) has to be loaded from disk or a ‘soft’ fault where the missing page is already in memory (typically because it is in a shared library and being used elsewhere) and so all that has to happen is for the page to be mapped into the address space of the executing program.

(The above is all a simplification, but I hope it is clear enough.)

Soft faults, as you might expect are handled much faster than hard faults – as disk access is generally many orders of magnitude slower than memory access.

Memory management and paging policies are generally designed to minimise the number of faults, especially hard faults.

So – what is the ratio of hard and soft faults? I have further extended the valext program I wrote for my MSc project to count just that – and it seems that on a loaded Linux system soft faults are generally an order of magnitude more common than hard faults even when launching a new program form ‘scratch’ (eg I am seeking to run an instance of ‘audacity’ under valext – and after executing 326,000 instructions there have been 274 soft faults and 37 hard faults).

That is good, of course, because it makes for faster, more efficient computing. But it also means that further optimising the paging policy of Linux is tough – hard faults take time so you can run a lot of optimising code and hope to have a better performance if you cut the number of hard faults even only slightly. But if soft faults out number hard faults by 10 to 1 then running a lot of extra code to cut the number of faults may not be so beneficial.

(You can download valext at github – here NB: right now this extra feature is in the ‘faultcount’ branch – it will be merged into master in due course.)

Best book on Linux kernel internals

Write an MSc project report means having to read a lot of source code and constantly referring to texts in the hope that they will make things clearer.

I have three books on the kernel – there are obviously others, but I think two of these three will be familiar to most kernel hackers – but it is the third that I rate most highly.

Understanding the Linux Kernel: for many this must feel like the standard text on the kernel – it’s published by O’Reilly, so (my experience with their XML Pocket Reference not withstanding) will be good, it’s well printed and readable and it is, after all, the third edition of a book your forefathers used. But the problem is, it is also now six years old and  a lot has happened since then. I well remember going into Foyles in the autumn of 2005 and seeing it newly minted on the shelves. For a long time it could hide behind the fact that the kernel was still in the 2.6 series, but even that protection is gone. Verdict: Venerable but getting close to past it.

Linux Kernel Development: the previous, second, edition of this book was a fantastic introduction to how the kernel worked and was written in a slightly whimsical tone which made it easier to read. It is rare that one can read a computer book like a novel, starting from the first page and going on to the end, but you could with that. Sadly someone seems to have got to Robert Love (who, from personal experience I know to be a great guy) and presumably told him he had to be more serious if he wanted his book to be a set text for CS courses. The problem is that the book now falls between two stools – somewhere between a solid but broad introduction to how the kernel works and a guide to writing kernel code. Unfortunately,  it does not quite hit either target. That said, it is still worth having. Verdict: Good, but where did the magic go?

Professional Linux Kernel Architecture : unfortunately, everything about the Wrox brand suggests “cheap and nasty”, which is a real pity, as this book is the best of the bunch. Admittedly it would not, as Robert Love’s book, be at all suitable as a primer for operating system study – it is far too big and detailed for that. But if you are looking for an up to date (or reasonably so, anyway) helpmate for actually patching the kernel, then this seems to be the best choice. Sadly the cheap and nasty side does creap through on occasion with bad editing/translation, but it’s not enough to stop me from recommending it. Verdict: this one’s a keeper, for now any way.