Counting soft and hard faults

Harddisk icon is from Oxygen icons (http://www...
Image via Wikipedia

When a running program references a page of memory that is not mapped into its address space the operating system throws a “page fault” – calling some kernel code to ensure that the page is loaded and mapped, or if the address referenced is not legal, that an appropriate error (a seg fault on x86) is signalled and the program’s execution stopped.

If the address is ‘legal’ then two types of fault exist – a ‘hard’ fault where the missing memory (eg some code) has to be loaded from disk or a ‘soft’ fault where the missing page is already in memory (typically because it is in a shared library and being used elsewhere) and so all that has to happen is for the page to be mapped into the address space of the executing program.

(The above is all a simplification, but I hope it is clear enough.)

Soft faults, as you might expect are handled much faster than hard faults – as disk access is generally many orders of magnitude slower than memory access.

Memory management and paging policies are generally designed to minimise the number of faults, especially hard faults.

So – what is the ratio of hard and soft faults? I have further extended the valext program I wrote for my MSc project to count just that – and it seems that on a loaded Linux system soft faults are generally an order of magnitude more common than hard faults even when launching a new program form ‘scratch’ (eg I am seeking to run an instance of ‘audacity’ under valext – and after executing 326,000 instructions there have been 274 soft faults and 37 hard faults).

That is good, of course, because it makes for faster, more efficient computing. But it also means that further optimising the paging policy of Linux is tough – hard faults take time so you can run a lot of optimising code and hope to have a better performance if you cut the number of hard faults even only slightly. But if soft faults out number hard faults by 10 to 1 then running a lot of extra code to cut the number of faults may not be so beneficial.

(You can download valext at github – here NB: right now this extra feature is in the ‘faultcount’ branch – it will be merged into master in due course.)

Writing more code to avoid writing any of the report?

The C Programming Language
Image by mrbill via Flickr

I have managed to churn out 160 lines of working C today – which I think is quite good going, though, according to this, maybe I could have churned out 400 of C++ or even 960 of Perl (I love Perl but the mind boggles).

My program will tell you how pages pages are present or how many have been swapped (it has the ability to do a bit more too, but I have not exploited that even though it is essentially there) – you can fetch it from github here: valext (the name reflects the idea that this was going to be an extension to Valgrind but then I discovered/realised that was never going to work).

Anyway, what I now have to face up to is whether I am churning out code to avoid actually writing any of my MSc project report – I have a feeling that some of that is going on and think I need to start rereading Writing for Computer Science – which helped me greatly with the proposal.

Exams are over, project remains

Tux, the Linux penguin
Image via Wikipedia

Today saw my last exam for the MSc, though I still have to complete the project.

So, lots of posts about Linux page management to follow – I hope.

Books I recommend for Birkbeck MSc Computer Science students

Display at the Centre for Computing History.
Image via Wikipedia

I thought I would list some of the books – other than the set texts – that I think new students (for 2011/12 intake) to Birkbeck ought to read. Sadly, for me, I didn’t really read any of them until the second year – but they are still useful.

This is will probably be an ongoing series.

For C++/programming

Not a book about C++ at all – and very hard going – but Structure and Interpretation of Computer Programs will teach you a lot about computer programming and the fundamentals of computing.

For Fundamentals of Computing

Has to be The Annotated Turing: really a fantastic book, well written as both an exposition of the subject matter and of Turing’s landmark paper.

For operating systems

I would like to recommend The Design of the Unix Operating System: but you may find it hard to get copies of it (though there are few in the Birkbeck library). The Design and Implementation of the 4.4 BSD Operating System might do instead but is close to being out of print also. Understanding the Linux Kernel is much cheaper but I think is far too complex as an introductory text.

A cheap and cheerful alternative might be Lions’ Commentary on UNIX with Source Code – which is close to 40 years old, somewhat rudimentary and describes an even more primitive version of Unix than “The Design of…” but does have sources that make it clear how the OS really works.

I won’t recommend a book on UML or the UP because I hate them both and have yet to find a book that doesn’t make that worse.