I went to the University of Hertfordshire’s postgraduate open day today – a couple of strange feelings about the trip for political reasons – the one previous time I had been on the campus was when I had a not every pleasant encounter with the Revolutionary Socialist League/Militant Tendency controlled student Labour club of what was then Hatfield Polytechnic at the very end of 1987, and on the way there I suddenly realised I was walking down the same street where I had spent the afternoon “knocking up” for Labour on 1 May 1997.
But my trip today was nothing to do with politics: I was simply checking them out as a potential PhD institution. It was a long way to go for a five-minute conversation, though it was still worth it as it clarified for me that I should concentrate my efforts on getting the MSc project done and not worry about a PhD just yet.
I was quite impressed by the university, though, for a few reasons. Firstly, most of the main campus is built in a particular civic style popular in Britain in the late 1940s, early 1950s: low rise vernacular brick and glass I am going to call it as I don’t know any better. It’s a style I like because to me it suggests optimism and endeavour and public spirit after the huge financial and other hardships of the war – a war in which Britain’s willingness to sacrifice its finances was absolutely fundamental to the global victory over fascism and militarism.
Secondly, because they told a good tale of how they believe they are the best of the “new universities” and that they have backed up that claim by investing significantly in their infrastructure – they also said they run the largest university bus service in the world: so the Chinese are not beating Britain on every superlative yet!
My first exam in the second year of the (part-time) MSc is tomorrow and I guess I am writing this blog partly as a way of avoiding more revision, but partly also because if last year’s experience is any guide, that exam will knock the stuffing out of any optimism I have, so I shall write something now while I still have some hope.
The exams are not the end of the degree if I pass them then technically I can claim a post-graduate diploma, but I already have one of them, in Journalism Studies from Westminster and as was said to me at the time “it’s just about worth the paper it is printed on”: I learnt a lot but nobody much else is impressed.
To get the degree I need to complete my project on memory management in the Linux kernel – it’s an ambitious project and time will be short so it may get frantic.
But when it’s over, what will I do? I don’t plan to work in IT: 45 seems quite an age to go from reasonable success and some prospects in one career to starting at the bottom in any case.
But nor do I want to abandon science for a second time. A part-time PhD? That really is a long term commitment, though.
I started work last night on writing up a very early draft of my project proposal. For several reasons it was a lot more difficult than I had expected.
Firstly, and typically, I let the technology of the writing tool get in the way of the actual writing. I spent much more time fiddling with LyX and various templates than writing anything. Should I just write plain text in a word processor and then copy that into the LaTeX tool or soldier on with LyX (after all even the project proposal will have to include mathematical notation)?
Secondly, while I hoped to use Christmas, and the time off work, to find the time to work on this, doing without distraction means staying up to 3am as the house is only quiet after 1am or later at Christmas. Not really sustainable.
Thirdly, and this is the most difficult one – all my mental images of what I was going to write – here’s the hypothesis (currently framed as “a more thorough-going application of the working set concept in the Linux kernel will improve performance”) and everything else flows from that, just melted away.
Indeed I am sort of working on the idea that I really need to explain some of the core ideas and then present a hypothesis to be tested.
I stumbled across the site XMLSucks.com just now when reading a comment on slashdot about the idea that there was an FBI mandated “backdoor” in OpenBSD.
Right now I am working on some coursework with XML and so the site has my sympathy. For sure, XML has its uses – SVG seems like a pretty good idea to me and I have used it recently to generate graphics to represent the processes running on a Linux box.
But freely mixing it with HTML on the web? I am inclined to (mostly) agree with the statement on the site:
XML is bloated. XMLis fugly. XML is only “human-readable” if you’re willing to stretch the definition of “human-readable.” The same goes for the proposed bloatware of HTML5. Anyone looking at the spec must be shaking their heads. Sure, it’s better than the now-abandoned xhtml 2.0, but that’s not saying much. I
This is the first “normal” – not abroad or just back, not jet lagged and so on – weekend I’ve been able to have at home in a month and it has also been the first time in that period where I have been able to expend some time to looking further at my proposed MSc project – on extending working set heuristics in the Linux kernel.
The good news is that I am once more convinced of the utility of, and enthusiastic about the implementation of, the idea. At the risk of looking very naive in six months (or six weeks) time even in my own eyes – here is the core idea:
Peter Denning’s 1968 and 1970 papers on the working set and virtual memory made some bold claims – calling global page replacement algorithms “in general sub-optimal” and asserting that the working set method is the best practical guarantee against thrashing.
Windows NT and its derivatives (XP, Vista, 7 etc) reflect their heritage from VMS in using a working set based replacement policy.
In contrast Linux (and the Unix family generally) use global replacement policies: indeed a fairly simple clock algorithm stands at the centre of Linux’s page replacement policies. Kernel developers say the policy works well in practice and that, in effect, the active “least recently used” list of cached pages – against which the clock algorithm runs, is a list of pages in the working sets of running processes.
My essential idea is to seek to trim the active list on a process-by-process basis when the system is under high load (the long delay in execution caused by a page fault hopefully making it efficient to execute the extra code in the hope of reducing the number of page faults.) Pages from the active list that are owned by the processes with the biggest memory footprint will be dropped into the inactive list, so making it more likely they will be eventually swapped out.
The second aspect of the application of a working set heuristic will be to alter the scheduling priorities of processes depending on their memory footprint. There are a few options here and I have not looked at this closely enough yet, but things to test could include:
Increasing the priority of the smallest processes – on the basis these might reach the end of execution more quickly and so release memory back to the pool
Radically lowering the priorities of the processes whose pages are being swapped out – on the basis that they do not have a working set of resources available and so, as Denning argued forty years ago, should not be able to run
In practical terms I am still some way off writing any kernel code. I have, though, written some user tools (still need polishing) to display the memory footprint of Linux processes in a red-black tree (the representation used internally by the kernel). Following Eric S Raymond (on Unix programming not politics!), the tools are partitioned into single applications that do different things – but run together they can generate graphics such as the one below:
The one thing that my MSc course has taught me is that things I thought were simple – such as SQL – are in fact fiendishly difficult in the hands of an examiner and with a clock ticking against you.
Simple queries like SELECT Name FROM ListOfNames just are not where it is at.
Here is one exercise I found on the web that I was not able to solve (I gave up after half an hour).
The database of naval ships that took part in World War II is under consideration. The database has the following relations: Classes(class, type, country, numGuns, bore, displacement) Ships(name, class, launched) Battles(name, date) Outcomes(ship, battle, result) Ships in classes are arranged to a single project. A class is normally assigned the name of the first ship in the class under consideration (head ship); otherwise, the class name does not coincide with any ship name in the database. The Classes relation includes the class name, type (bb for a battle ship, or bc for a battle cruiser), country where the ship was built, number of main guns, gun caliber (diameter of the gun barrel, in inches), and displacement (weight in tons). The Ships relation includes the ship name, its class name, and launch year. The Battles relation covers the name and date of a battle the ships participated; while the result of their participation in the battle (sunk, damaged, or unharmed – OK) is in the Outcomes relation. Note: the Outcomes relation may include the ships not included in the Ships relation.
Point out the battles in which at least three ships from the same country took part.