Picking which page to remove


I have now written some very basic but functional paging code for the Microblaze but have realised I am, essentially, going to have to start a lot of it again.

The Microblaze puts page tables firmly under software control and its memory management unit (MMU) concentrates on managing page references through its translation lookaside buffers (TLBs) instead. So a reference to a virtual address that is mapped in a TLB is handled invisibly to the user or programmer, but a reference to a memory address that may be mapped in a page table but not in the TLB generates an exception and everything is dumped into the lap of the programmer – this is what is often known as a “soft fault” (a “hard fault” being when there is no version of the page being referenced available in physical memory at all).

The number of TLBs is quite limited – just 64 – so to handle (or in my case, simulate), say, 512kB of physical memory through 4kB page frames you need to expect to handle quite a few soft as well as hard faults and so you need to take a view on which mappings get evicted from the TLB and – if you are referencing more than 512kB of program and data – which pages get evicted from the page table and, hence, physical memory.

This is where the difficulties start, because there is no simple way to know when a read through a TLB mapping takes place – the very point is that it is invisible (and hence quick) to the programmer or any software she or he may have written. The problem with that is that – for most situations – the only guide you have to future behaviour in terms of memory references is past behaviour: so it would be very helpful to know whether a page has been recently accessed (on the basis that if it was accessed recently then it will be accessed again soon).

The standard response to this sort of problem is some form of “CLOCK” algorithm – which was first implemented, I believe, for the Multics operating system.

Multics can be thought of as the estranged (and now late) father of Unix – the “Uni” being a play on the “Multi” of the earlier system – and both direction and through its offspring its influence on computers has been profound and one of our inheritances is CLOCK, some version of which is almost certainly running in the computer, phone or tablet on which you are reading this.

The principle of CLOCK is simple. A “clock hand” sweeps regularly through the page mappings marking them invalid, then on a subsequent attempt to reuse the page mapping the valid bit has to be reset (meaning the page has been used recently) or alternatively if a new mapping is needed then we can through out the first page in the list of mappings where the mapping is marked as invalid.

And this, or some version of it is what I am now going to have to implement for the Microblaze. The obvious thing to do is to have some sort of timer interrupt drive the time clock – though I am not even sure the Microblaze has a timer interrupt available – I’m guessing it doesn’t – as it would expect those to come from the board, so this could be tricky!

Time to write a signal handler?


Unix Creators at DEC PDP11
Unix Creators at DEC PDP11 (Photo credit: PanelSwitchman)

I am trying to execute some self-written pieces of software that require a lot of wall clock time – around three weeks.

I run them on the University of York‘s compute server which is rebooted on the first Tuesday of every month, so the window for the software is limited. I have until about 7 am on 5 August before the next reboot.

To add to the complication the server runs Kerberos which does not seem to play well with the screen/NFS combination I am using.

And – I keep killing the applications in error – this time, just half an hour ago I assume I was on a dead terminal session (ie an ssh login which had long since expired) and pressed ctrl-C, only to my horror to discover it was a live screen (it had not responded to ctrl-A, ctrl-A for whatever reason).

Time to add a signal handler to catch ctrl-C to at least give me the option of changing my mind!

Give yourself a Christmas present: learn sed


English: A Shebang, also Hashbang or Sharp ban...
A Shebang, also Hashbang or Sharp bang. (Photo credit: Wikipedia)

Text is at the core of The Unix Way – and all True Unix Hackers work from the command line. This much you know.

(If you don’t get a copy of The Art of Unix Programming – there is an awful lot of rubbish in that book but it does do one thing well: explain the deep connection between text and Unix.)

In a practical sense this means to get the best from your Unix system (and this includes you if you are a Mac OSX user) you need to boost your command line skills. The first thing to do is, of course, become familiar with a text editor – either vi or emacs (I am a vi user, but refuse to engage in a religious war on this matter.)

Then, perhaps not the next thing, but one of the next things you should do is learn sed – the streaming editor – one of the many gifts to the world (including Unix, of course) from Bell Labs (I recently read The Idea Factory: Bell Labs and the Great Age of American Innovation and I suppose I really ought to get around to writing a review of that).

Sed comes from the 1970s, but as so often in computing, it feels to me that its time has come again – in the era of big data a program that allows you to edit a file one line at a time – as opposed to trying to read as much of a file as possible into your computer’s memory – has come round again.

If you are sufficiently long in the tooth to have messed about with Microsoft’s edlin or any other line editor you might be forgiven for giving a hollow laugh at this point – but sed is a tool that genuinely repays the effort you have to make to learn it.

In the last few weeks I have been messing about with 220GB XML files and even the University of York’s big iron compute server cannot handle a buffered edit of a file that size – sed is the only realistic alternative (actually I thought about using my own hex editor – hexxed – which is also essentially a line editor, but a hex editor is really for messing about with binary files and I wouldn’t recommend it.

Sed has allowed me to fix errors deep inside very large files with just a few commands – eg:

LANG=C sed ‘51815253s@^.*$@<instruction address=\’004cf024\’ size=’03’ />@’ infile.xml >outfile.xml

Fixes line 51,815,253 in my file (the line identified by an XML fatal error). Earlier I had executed another line of sed to see what was wrong with that line:

LANG=C sed -n ‘51815253p’ infile.xml

(The LANG=C prefix is because the breakage involved an alien locale seemingly being injected into my file.)

Sed allows you to do much more – for instance anything you can identify through a pattern can be altered. Let’s say you have (text) documents with your old email address – me@oldaddress.com – and you want to change that to your new address – me@newaddress.com …

sed ‘s/me@oldaddress\.com/me@newaddress\.com/g’ mytext.txt > newtext.txt

Then check newtext.txt for correctness before using mv to replace the original.

But there is much, much more you can do with it.

Plus you get real cred as a Unix hacker if you know it.

Now, too many programs these days – especially anything from Redmond – go out of their way to suppress text formats. Text, after all, is resistant to the “embrace and extend” methodology – text wants to be free. But there is plenty of it out there still.

Books that teach you about sed are not so plentiful – I have been reading an old edition of sed & awk – which seems to be out of print – though you can buy a second hand copy for less than a quid excluding postage costs. Well worth the investment, I’d say.

Twenty years of Windows NT


Windows NT 3.1
Windows NT 3.1 (Photo credit: Wikipedia)

Today is the twentieth anniversary of the launch of Windows NT. In fact I have been using it (I still have to – in the sense that Windows XP/7/8 are NT – on occasion) for a bit longer than that as I was a member of the NT beta programme – I even managed to write some software for it using the 32 bit SDK before it was officially released (a not very good version of exie-ohsies/naughts-and-crosses/tic-tac-toe – so poor was the coding that you could beat the computer every time if you knew its weakness).

NT would not run on my Amstrad 386 and in the end I bought a cheap 486 machine to match the software – it was a lot of fun in the early days – though I shudder to think of trying to get by, day by day, on a machine with such a weak command line.

One thing I remember was the late, great, Byte magazine running an issue in the summer of 1992 predicting that NT would be the death of Unix. I even thought that was right – Unix was for expensive workstations and now us users of commodity hardware were to get a proper multi-tasking 32 bit operating system – who needed time-sharing when we could all have PCs of our own?

Plan9 on the Raspberry Pi


Glenda, the Plan 9 Bunny
Glenda, the Plan 9 Bunny (Photo credit: Wikipedia)

Plan 9 from Bell Labs” was meant to be the successor system to Unix and like the original was designed and built by AT&Ts engineers at Bell Labs(the title is, of course, a skit on what is supposedly the best worst-ever film – “Plan 9 from Outer Space”).

Plan 9 never really made it. Linux came along and gave us Unix for the masses on cheap hardware for free and the world moved on. (Though some of the ideas in Plan 9 were retro-fitted into Linux and other Unix-like systems.)

The increased speed of commodity computers – latterly sustained via SMP – meant that computing power that once seemed only available to the elite could be found on the High Street and easy to use and install clustering software meant scientists and others could build super-computers using cheap hardware and free software. The multi-computer idea at the heart of Plan 9 seemed to have been passed-by as we screamed along the Moore’s Law Highway.

But now Moore’s Law is breaking down – or rather we are discovering that while the Law continues to apply – in other words we can still double the number of transistors on silicon every 18 – 24 months – other factors (heat dissipation essentially) mean we cannot translate a doubling of transistors into a computer that runs twice as fast. And so the multi-computer idea is of interest once more.

Plan 9 is not likely to be the operating system of the future. But as an actually existing multi-computer operating system it could still have a lot to teach us.

Now it has been ported to run on the Raspberry Pi single board computer I have decided to order another three of these things (I already have one running as a proxy server) and use them as Plan 9 nodes. The boards should be here in about three weeks (I hope), meaning I will have them as a Christmas present to myself.

The problem with Apple kit (part one?)


Last September I joined a startup, Centreground Political Communications and, like my three fellow employees, have been using Apple equipment more or less since then.

It is good quality kit, focused on (typical) user experience: like Windows done right. And, yes, as a Unix/Linux person I also get a bash shell and access to forty years of engineering excellence.

But it is difficult not to see the faults also:

  • The one button mouse – a legacy of the 1980s and one that Apple must recognise is a poor one (why else have all those Ctrl dependent commands?
  • The emphasis of design over function – no ethernet port, and a wireless performance massively below even the least powerful IBM compatible
  • Audio that works out of the box – if you can ever hear it

Adding vi-like functionality to Hexxed


A diagram showing the key Unix and Unix-like o...
A diagram showing the key Unix and Unix-like operating systems (Photo credit: Wikipedia)

I have decided that I will model the keyboard interface for Hexxed on vi.

I know that is not what many/any coming from outside the Unix world will expect, but then there are plenty of Hex editors out there and I want to make one that will appeal to at least one niche.

As I instinctively type “:w” in all sorts of places these days, I think there will be some other people out there who might like that sort of functionality too.