I have been struggling with a problem with my Microblaze simulation on OVPsim all week.
I have been trying to make a start on implementing a simple demand paging system and to trigger the hardware exception that the Microblaze CPU manual says should happen when you either try to read from a mapped page that is not in the TLB (ie, a “minor” page fault) or has not been mapped at all (a “major” page fault).
My problem was that – despite having, I thought, specified all the correct parameters for the Microblaze and knowing that I had virtual memory mappings work well, it simply was not happening for me. Instead of the expected exception handling, the simulation reported an error and exited.
But Jamie Garside from the University of York’s computer science department saved my life by (correctly) pointing out that what I also needed to do was turn on ICM_ATTR_SIMEX. Otherwise the simulation will always exit on an exception (or halt in GDB).
I can see why that might make sense – at least the halting in the debugger bit: if an exception is about to be fired you can halt the machine while it is still executing in virtual, as opposed to real, mode and see what has caused it.
It was also a reminder to RTFM – in this case not just the Microblaze manual, but also the OVPsim manual. I cannot rely on Jamie doing that for me everytime.
So to fix my problem I ORed in the missing attribute:
This follows on from the previous post – here are the plots.
These are based on a run of the PARSEC benchmark suite x264 program – encoding 32 frames of video using 18 threads – 16 worker threads: the plots show how often each 16 byte “line” is used – whether as an instruction is read or the memory is used for read-write storage. Sixteen bytes is both the size of a typical cache line and a read from a DDR memory.
The code plot might suggest there is some pattern – between about segments 100 (offset 0x640 inside the page) and 200 (offset 0xC80) there is an increased hit rate) but my guess is that is an artefact of the particular code being used here, rather a general issue (possible explanations would be a particular library function being very heavily used): though conceivably it could be an issue with the GCC compiler or linker.
That might be worth further exploration, but not for me. From visual inspection I am concluding that the distribution of accesses inside a 4k page doesn’t justify trying to pre-cache particular 16 byte “lines”.
Now I have mapped the speed of OPT and LRU using a traditional two level memory hierarchy, my task is to find something better that might make NoC a more viable computing platform.
One thought that occurred to me was that the preference of C and similar languages for page aligned memory in certain situations might make it feasible or reasonable to cache the first 128 bits (16 bytes) in faster access memory.
So I wrote some code to test which 16 byte “segments” of a page were accessed by code.
The list that follows is for read/write accesses (i.e. not instructions). I will get around to plotting a graph, but it looks like that idea – which was always a bit of a straw clutching exercise – is not going to fly.