Counting soft and hard faults

Standard
Harddisk icon is from Oxygen icons (http://www...

Image via Wikipedia

When a running program references a page of memory that is not mapped into its address space the operating system throws a “page fault” – calling some kernel code to ensure that the page is loaded and mapped, or if the address referenced is not legal, that an appropriate error (a seg fault on x86) is signalled and the program’s execution stopped.

If the address is ‘legal’ then two types of fault exist – a ‘hard’ fault where the missing memory (eg some code) has to be loaded from disk or a ‘soft’ fault where the missing page is already in memory (typically because it is in a shared library and being used elsewhere) and so all that has to happen is for the page to be mapped into the address space of the executing program.

(The above is all a simplification, but I hope it is clear enough.)

Soft faults, as you might expect are handled much faster than hard faults – as disk access is generally many orders of magnitude slower than memory access.

Memory management and paging policies are generally designed to minimise the number of faults, especially hard faults.

So – what is the ratio of hard and soft faults? I have further extended the valext program I wrote for my MSc project to count just that – and it seems that on a loaded Linux system soft faults are generally an order of magnitude more common than hard faults even when launching a new program form ‘scratch’ (eg I am seeking to run an instance of ‘audacity’ under valext – and after executing 326,000 instructions there have been 274 soft faults and 37 hard faults).

That is good, of course, because it makes for faster, more efficient computing. But it also means that further optimising the paging policy of Linux is tough – hard faults take time so you can run a lot of optimising code and hope to have a better performance if you cut the number of hard faults even only slightly. But if soft faults out number hard faults by 10 to 1 then running a lot of extra code to cut the number of faults may not be so beneficial.

(You can download valext at github – here NB: right now this extra feature is in the ‘faultcount’ branch – it will be merged into master in due course.)