Had a good meeting with my PhD supervisor today: he was in London – I didn’t have to make a flying visit to York.
So the next steps with my OVPsimMicroblaze code is to model global and local memory – by default OVPsim treats all memory as local, mapping the full 32-bit address space and sparsely filling that as needed. I have imposed an additional constraint of only mapping a few pages, but these are still all “local”: so while the code takes time to execute, what is in effect, a FIFOpage replacement algorithm, there is no time for page loads.
The way round this seems to be to build my global memory as a memory-mapped peripheral device – I can then impose time delays on reads and writes.
But I suppose I am writing this blog instead of writing that code…
I have been struggling with a problem with my Microblaze simulation on OVPsim all week.
I have been trying to make a start on implementing a simple demand paging system and to trigger the hardware exception that the Microblaze CPU manual says should happen when you either try to read from a mapped page that is not in the TLB (ie, a “minor” page fault) or has not been mapped at all (a “major” page fault).
My problem was that – despite having, I thought, specified all the correct parameters for the Microblaze and knowing that I had virtual memory mappings work well, it simply was not happening for me. Instead of the expected exception handling, the simulation reported an error and exited.
But Jamie Garside from the University of York’s computer science department saved my life by (correctly) pointing out that what I also needed to do was turn on ICM_ATTR_SIMEX. Otherwise the simulation will always exit on an exception (or halt in GDB).
I can see why that might make sense – at least the halting in the debugger bit: if an exception is about to be fired you can halt the machine while it is still executing in virtual, as opposed to real, mode and see what has caused it.
It was also a reminder to RTFM – in this case not just the Microblaze manual, but also the OVPsim manual. I cannot rely on Jamie doing that for me everytime.
So to fix my problem I ORed in the missing attribute:
Tonight I finally managed to get the code and the sequencing right to not just boot the Microblaze simulation, but to turn on the MMU.
But I was puzzled as to why, as soon as I had done that, the execution breaks with a segmentation fault and appeared to be executing code I had not written. And then it dawned on me – all my addresses had immediately become virtual addresses.
I have had a frustrating few days trying to get to grips with two new pieces of the technology: the OVP simulator and the Microblaze processor.
Finally I think the fog is beginning to clear. But that also reveals just what a task I have in front of me: namely to write some kernel code that will boot the Microblaze, establish a virtual memory system and then hand over control to user code, which will have to trap memory faults and pass control back to the privileged kernel.
It is not quite writing an operating system, even a simple one, but it is actually undertaking to write what would be at the core of an OS.
Of course, there are lots of places to borrow ideas from – not least the Linux kernel – but it’s a bit daunting, if also reasonably exciting.
Preciously little books about to help – I shelled out to buy this (having borrowed it from the York Uni library and found it to be an excellent general introduction to the area) – but it’s not a guide to OVP never mind to the Microblaze. If anyone does know of a book that does either I’d be very grateful (maybe it’s my age but electronic books are very much second best to me – you just cannot flick from page to page looking for that key element you read the other day and so on.)
Paging and virtual memory is at the heart of just about any computing device – more complex than a DVD player – we use everyday.
Paging is the memory management system based on the idea that we can divide the real memory of our computer into a sequence of smallish (typically 4,096 bytes) of “page frames” and then load the bits of data and program in and out of those frames (in “pages”) as we need them.
So, you can have pages from various different running programs in the page frames at any given time and then use a “virtual memory” system to map the pages placed in an arbitrary frame to the memory address the program thinks the page should be resident in.
It is not the only system we could use – “segments”, which involve moving large chunks (as opposed to small pages) of memory about is one approach, while “overlays” – using part of the memory space as sort of scratchpad working area – is another. More recently, with bigger “traditional” computers very large pages have been used as a way of making, at least in theory, more efficient use of memory now measured in billions (as opposed to a few tens) of bytes.
But paging is easily the most widely used approach and has been integral to the development of “multitasking” and similar shared resources approaches to computing – because paging allows us to just have the useful bits of a program and its data in memory we can have many more programs “running” at a given time.
But my PhD research is pointing me towards some of the weaknesses of the paging approach.
At the heart of the case for paging is the idea of “locality” in a computer’s use of resources: if you use one memory address at one instant there is a high probability you will use a nearby address very soon: think of any sort of sequential document or record and you can see why that idea is grounded in very many use cases of computing devices.
Locality means that it ought to make sense to read in memory in blocks and not just one little drop at a time.
But this principle may be in opposition to efficient usage of memory when competition for space in fierce: such as for the limited local memory resources we have on a Network-on-Chip computer.
Right now I am collecting data to measure the efficiency of 4k pages on such (simulated) devices. With 16 simulated cores trying to handle up to 18 threads of execution competition for pages is intense and the evidence suggests that they are resident, in some cases at least, for many fewer “ticks” than it takes to load them from the next lowest level in the memory hierarchy. On top of that many pages show that the principle of locality can be quite a weak one – pages of code are, in general, quite likely to demonstrate high locality (especially in loops) but pages of read-write memory may not.
I don’t have all the data to hand – essentially I am transforming one 200GB XML file into another XML file which will likely be around the same size and that takes time, even on quite a high spec computer (especially when you have to share resources with other researchers). But I expect some interesting results.
Update: I have truncated this article for now (20 December) as there was an error in my LRU software that made LRU look like a much better performer than it really was. I’ll update this with the correct data shortly….
In 1966 László Bélády published “A study of replacement algorithms for virtual storage computers”, one of the truly epoch making papers for operating system science – the first comprehensive examination of page replacement strategies for virtual memory computers.
These days all but the simplest embedded computing devices will use some sort of virtual memory system because it allows computing devices to (relatively) seamlessly load bits of computer programs in and out of memory as needed – the programs see a faked – virtual – address and so the chunks can be loaded in an out of whatever piece of memory is available without worrying about having to get the chunks into exactly the same place every time.
But in 1966 virtual memory was a new and essentially experimental technology and so Belady’s examination of the different strategies for deciding which chunk (page) of memory was kept or replaced when new pages were required to be loaded is the foundation stone of all the approaches that followed.
This last couple of weeks I have found myself walking in the steps of Bélády as I built software to examine the different performance characteristics of potential page replacement policies in a network-on-chip computer.
I have about 220GB of XML data which represents a record of the memory accesses of an 18 threaded video processing application – and using that data I can test computer system performance using various different policies.