## The agony and the ecstasy of debugging

If you have ever written a computer program with any degree of seriousness then you will know the feeling: your heart sinking as you realise what you thought was a perfectly good piece of code has a bug somewhere less than obvious.

In my case this has happened twice in a week and both times has meant the work I had done as part of my PhD has had to start again (not all of it, obviously, but this, most recent bit). Yesterday evening’s realisation was particularly annoying because it came after I had sent my supervisor an email suggesting I had some quite interesting and counter-intuitive results to share.

Since then I had spent quite a few hours trying to work out what on Earth was wrong – debugging assembly is not complex in the sense that most instructions do simple things – but it also reminds you of the essential state-machine-nature of a general computing device: there are lots of things to track.

Of course, that also brings pleasure – there is no point in denying that solving these problem is one of the truly engaging things about computing.

Job is done now and I am once again collecting results and hoping that I do not spot another flaw.

## Chase down that bug

If there are rules for software development, one of them should be never let a bug go unsquashed.

This has been demonstrated to me again this week – when I had a bug in some Microblaze interrupt code. I realised I no longer needed to use the code and then puzzled over whether to find out what was wrong anyway.

I got some sound advice:

And I am very glad I acted on it – not least because it seems to me my problem was actually caused by a small bug in the OVP model for the Microblaze (the simulator model allows interrupts to be executed while an exception is in progress, but a real processor would not) and, hopefully, tracking down the bug there will benefits others in the future.

## Hard fault count drives performance

Just to emphasise how hard faults are determining for performance – here is a plot of hard faults versus page count for the same application mentioned in the previous post.

The pattern is very similar, though it should be noted that increasing the page count does still keep the fault count coming down at the far end – but not by enough to outweigh the negative effects of having larger page tables to search through when checking for faults and looking for the least recently used page and the like.

## Squashing the LRU bug

Just for once I did not rush to an online forum and say I had found a bug in a product – and I was right not too.

Having tried three different cross compiler toolchains I convinced myself that the issue was plainly neither compiler (or, to be more accurate in this case, assembler) output but some process in my code that was causing corruption. And sure enough I found that I was mangling the physical addresses of my page frames.

Thanks to the way OVPsim operates – by default it provides physical mappings on demand for the full 4GB address space of a 32 bit register – this mangling did not generate a page fault, but it did mean that for certain sequences of instructions – particularly those where lots of page faults were likely to occur, memory was being corrupted.

Changing one line of assembly – so that virtual address output was written to virtual address slot and not the physical address slot in my simple list of page table entries fixed that.

So now the code works – at least I think it does!

## Curiouser and curiouser – the case of the LRU bug

My LRU queue bug is continuing to puzzle me – and it’s not as simple as a data misalignment. In fact it does not appear to be a data misalignment issue at all: before I was trapping a lot of hardware exceptions under that header because it was a common fault when I got the code wrong, but a closer examination showed it to be an illegal opcode exception.

How that could be caused by the size of the local memory we were simulating was beyond me – but perhaps some code was being pushed out of alignment and an illegal instruction created, I thought.

But that does not appear to be the issue at all – in fact the really puzzling thing is that the exact same string of opcodes at the same addresses runs without a problem in the version with the functional memory sizes as with the “broken” memory sizes.

The only difference seems to be that when the broken code (ie the setup with the non $mod 4$ number of 4k memory pages) raises an illegal opcode exception, the good code raises a page fault.

It looks like it might be a bug in the simulator itself – and having written that I am hoping that the bad workman’s curse now befalls me and I quickly find it was all my fault to begin with. But as of now I am drawing a blank.

## LRU queue strangeness

For the last week or so I have been writing and then debugging (and so mainly debugging) a least-recently-used (LRU) page replacement system on my Microblaze simulation.

Perhaps I shouldn’t have bothered – I had a working first-in-first-out (FIFO) system after all. But no one seriously uses FIFO, so I had to write some LRU code.

I thought I had it working tonight – it ran through the target exercise in about 6 million instructions (as the MMU in a Microblaze is crude, memory gets loaded in and out ‘by hand’ and so the instruction count measures time/efficiency) when the system had 32 4k local pages and in about 10.5 million instructions when it had 24 4k pages available locally – all seems fine: less pages means more time is required.

But then things started to get weird – testing the system with 28 pages took about 9 million instructions, but when I tried to use 26 pages I had to break the code after it had executed 14 trillion instructions.

Indeed it seems to only work for a very limited number of page counts. How odd – though a typically debuggers story. A week in, finally thinking I’d cracked it when some really strange behaviour manifests itself.

Update: It appears to be an unaligned data exeception issue. Somewhere along the line a piece of code relies on the LRU queue to be a multiple of 4 in length would be my guess…

## Simulating global and local memory on OVP

Had a good meeting with my PhD supervisor today: he was in London – I didn’t have to make a flying visit to York.

So the next steps with my OVPsim Microblaze code is to model global and local memory – by default OVPsim treats all memory as local, mapping the full 32-bit address space and sparsely filling that as needed. I have imposed an additional constraint of only mapping a few pages, but these are still all “local”: so while the code takes time to execute, what is in effect, a FIFO page replacement algorithm, there is no time for page loads.

The way round this seems to be to build my global memory as a memory-mapped peripheral device – I can then impose time delays on reads and writes.

But I suppose I am writing this blog instead of writing that code…