I am still waiting the result of the Birkbeck exam board to find out whether I have been awarded the MSc, but I think I have waited long enough before discussing my thoughts on some of the issues around page caching my project raised.
Linux‘s page caching is based on the “2Q” concept – pages in use by the system are stored in one of two queues (hence “2Q”) – in one of two states – see the diagram below, though this also includes states for pages being flushed from the cache:
Typically a page enters the cache in the INACTIVE UNREF state. If the cache needs to be trimmed pages in this state are the most likely to go (I am simplifying this for ease of exposition). On reference the page will be promoted to the INACTIVE REF state and on a further reference could join the ACTIVE queue – again in the ACTIVE UNREF state.
A “clock” mechanism with two sweeping hands controls the demotion of pages: the first hand marks any page in the REF state as UNREF and if a subsequent reference has not pushed the page back to the REF state then the page could be either removed from the ACTIVE queue or, if in the INACTIVE queue, from the cache completely (this is actually a bit more complicated, for instance pages which are “dirty”, ie requiring write back to disk, are less likely to be demoted than clean pages which can just be dropped, and so on.)
The 2Q mechanism is designed to stop pages which are only referenced once (eg a page required on initialisation) from being kept in the cache in preference to a page that might be referenced more than once – essentially frequently referenced pages get bumped up to the ACTIVE queue are so are less likely to be pushed out of the cache – eg a page in the ACTIVE REF state would face three calls to shrink_xxxx_list before being removed and if another reference happens in that time (crudely, three sweeps of the clock) it will be preserved in the cache.
This behaviour is generally thought to be a strength, but what if it were also a weakness?
The graphic below shows the memory access patterns of a simple Firefox open and close:
It can immediately be seen that memory access falls into clear phases. The problem with 2Q is that it works well within those phases but as the application moves from one phase of memory access to another, 2Q risks holding pages from the old phase in memory for too long – indeed pages from the new phase could even face eviction to preserve pages (once) in the ACTIVE list in memory if memory pressure was high. The old pages have a very high or even infinite reuse distance, but are privileged by 2Q.
How can this be fixed? That’s not so easy to answer – especially as the pages from the old phase may be in use elsewhere in the system (for instance if they came from a shared library). But I am convinced that finding an answer to this could have a big impact on the performance of low memory systems.
- Cache Reheating – Not to be Ignored (mongodb.org)
- Best book on Linux kernel internals (cartesianproduct.wordpress.com)
- Yet Another Misinformed Swipe At Open Source and Linux (ostatic.com)