How much memory do you need?

What’s the best way to speed up your computing experience? As many know the most cost-effective way is often not to buy a new computer with a faster processor but to add more memory to an existing system.

The plot below, based on some results from my PhD research shows just how this works…

timingsIn my case I am measuring how long it takes a processor of a fixed speed to execute a given program while I vary the amount of memory available.

My research centres on “network-on-chip” systems so the issue here is how much memory is available locally (i.e., on the chip). Because if instructions and data are not locally available they have to be fetched in from some more “distant” store (whether that is system global memory or even a disk system). And if space on the chip is limited then you generally have to evict some other piece of memory to make way for the piece needed immediately. Yet if you need to evicted piece again in the future you have to take the time needed to fetch it back in again and so on.

In this case we are using 4096 byte (4K) blocks of memory – i.e., 4K pages. So when it says 10 pages on the x-axis that means there is 40K available locally and so on.

I am testing all this on the OVPsim emulator and I have no independent means of accurately timing how long the process takes – but I can count the number of instructions it takes to complete the task.

Factors that affect the number of instructions taken are – in order of likely importance:

  • the time taken to fetch whole pages in and out of memory – a “hard fault” occurs when a page has to be evicted and a new page brought in instead (the earliest hard faults are not as slow as the later ones as until the full number of pages are used no eviction is required);
  • the time it takes to ensure that pages that are in memory but which are not immediately available for execution (because the number of translation mappings that tell the central processor where they are is limited – to 8 in this test case – and so for each fault we first have to test if the page is available but unmapped). If an unmapped page is available we have a “soft fault” which is much quicker to process than a hard fault as no eviction etc is required;
  • picking the next page that will be evicted if have a hard fault – in this case we aim to pick the page that was least recently used, but even then we have to use an approximation. This process is not the same as evicting the page – it merely indicates which one is the best candidate for eviction should that become necessary.

The number of hard faults is reduced by increasing the amount of memory available (until, of course, you have so much memory you never need evict a page). But as you increase the amount of memory you also make checking for soft faults and picking the next candidate for eviction slower – because there are more pages to check.

And this is what we see in the plot. When memory is in short supply adding even a single extra page can make a big difference. There is then a long range where the decrease is not so great but constant. This reflects the fact that memory addresses are not needed at random and that programs show ‘locality of preference’ (ie if you need one address at one moment you are quite likely to need a nearby address at the next). This locality means that adding extra pages can have a limited impact once you have enough to cover the range of likely picks in a reasonable time frame. Then adding extra memory means that the changes between the different phases of locality becomes smoother, so we still see a reduction in time but not anywhere near as pronounced as before.

Then we get to the point – about 30 pages in this case – where we are close to covering the full range of addresses used, even when we switch between phases. In this case we see a sudden fall again until – at about 33 – 35 pages – we seem to have covered every single address our program will ever use.

After that having more memory makes the system slightly slower (NB in most real-world desk and laptop systems adding more memory does not slow down the system – so don’t worry about that!).

The lesson: if your computer is really slow and you don’t want to replace it, add more memory. But if you computer already has so much memory that it doesn’t know what to do with it and it is slow, you have a bigger problem!

Scale of the task

English: Messages from the Linux kernel 3.0.0 ...
English: Messages from the Linux kernel 3.0.0 booting, from Debian sid i386. (Photo credit: Wikipedia)

I have had a frustrating few days trying to get to grips with two new pieces of the technology: the OVP simulator and the Microblaze processor.

Finally I think the fog is beginning to clear. But that also reveals just what a task I have in front of me: namely to write some kernel code that will boot the Microblaze, establish a virtual memory system and then hand over control to user code, which will have to trap memory faults and pass control back to the privileged kernel.

It is not quite writing an operating system, even a simple one, but it is actually undertaking to write what would be at the core of an OS.

Of course, there are lots of places to borrow ideas from – not least the Linux kernel – but it’s a bit daunting, if also reasonably exciting.

Preciously little books about to help – I shelled out to buy this (having borrowed it from the York Uni library and found it to be an excellent general introduction to the area) – but it’s not a guide to OVP never mind to the Microblaze. If anyone does know of a book that does either I’d be very grateful (maybe it’s my age but electronic books are very much second best to me – you just cannot flick from page to page looking for that key element you read the other day and so on.)

The cost of soft/minor faults

Here is a complete graph of the memory use and fault count for a run of ls -lia.

ls -lia memory useFault count for ls -liaAs you can see there are only soft/minor faults here – as one would expect (this was a machine with a lot of memory), as the C library provides the ls function and it will be loaded in memory (presumably the filesystem information was also in memory).

But there are a lot of soft faults – and these too have a cost, even if nothing like the cost of a hard fault. For a start each soft page fault almost certainly indicates a miss in the processor cache.

The paper linked here also gives a lot of information about Linux’s generation of soft/minor faults and their performance impact – it seems that the kernel is designed to deliver system wide flexibility at the expense of performance.

Hard faults and soft faults in the real world

I ran Audacity under valext and here is the graph of real memory use:

Real memory use of audacity

And here is the soft and hard fault count:

Audacity fault count

My surmise as to what you can see here? Lots of initialising – with memory use shooting up and down – though the low level of hard faults suggests much of this is from libraries already loaded in the system. Then pages getting swapped out as nothing happens – audacity did not actually display a window – not sure why – I killed it after it had been running for about two and half minutes of virtual time (around 24 hours of wall clock time) as that was more than enough time to produce something on screen!

Still, as a first test of the tool, that was not bad.

Counting soft and hard faults

Harddisk icon is from Oxygen icons (http://www...
Image via Wikipedia

When a running program references a page of memory that is not mapped into its address space the operating system throws a “page fault” – calling some kernel code to ensure that the page is loaded and mapped, or if the address referenced is not legal, that an appropriate error (a seg fault on x86) is signalled and the program’s execution stopped.

If the address is ‘legal’ then two types of fault exist – a ‘hard’ fault where the missing memory (eg some code) has to be loaded from disk or a ‘soft’ fault where the missing page is already in memory (typically because it is in a shared library and being used elsewhere) and so all that has to happen is for the page to be mapped into the address space of the executing program.

(The above is all a simplification, but I hope it is clear enough.)

Soft faults, as you might expect are handled much faster than hard faults – as disk access is generally many orders of magnitude slower than memory access.

Memory management and paging policies are generally designed to minimise the number of faults, especially hard faults.

So – what is the ratio of hard and soft faults? I have further extended the valext program I wrote for my MSc project to count just that – and it seems that on a loaded Linux system soft faults are generally an order of magnitude more common than hard faults even when launching a new program form ‘scratch’ (eg I am seeking to run an instance of ‘audacity’ under valext – and after executing 326,000 instructions there have been 274 soft faults and 37 hard faults).

That is good, of course, because it makes for faster, more efficient computing. But it also means that further optimising the paging policy of Linux is tough – hard faults take time so you can run a lot of optimising code and hope to have a better performance if you cut the number of hard faults even only slightly. But if soft faults out number hard faults by 10 to 1 then running a lot of extra code to cut the number of faults may not be so beneficial.

(You can download valext at github – here NB: right now this extra feature is in the ‘faultcount’ branch – it will be merged into master in due course.)