Curses on ncurses


gdb icon, created for the Open Icon Library
gdb icon, created for the Open Icon Library (Photo credit: Wikipedia)

Every programmer will be familiar with something like this…

A little while back I wrote a program that simulates – crudely but effectively – a multicore NoC device. I use it to model the execution times of different page replacement algorithms.

The input is XML generated via a step by step trace of a working program. The actually instructions being traced do not matter – what I care about are the memory access patterns.

To allow me to test more models more quickly I have now written some R code that generates a semi-random access pattern based, very loosely indeed, on the patterns seen in the real program. The advantage is I can test against a set number of memory accesses but with a range of pseudo-random access patterns, so although I am not running models against the “real” access pattern, neither am I taking three weeks per experiment.

But when I used the artificially generated access patterns, my program crashed with a seg fault. But even more confusingly, when I ran the code in GDB, the GNU Debugger, if I stepped through the code it worked, but I just ran the code in debugger then it crashed just as it did without using the debugger.

After a few hours I realised why – in my artificial patterns, the first thing the first thread does is spawn all the other threads to be used. In real world code, of course, these spawns take place after quite some code has been executed.

Every code spawn causes the ncurses code I am using to update the screen. When using ‘real’ access patterns these updates take place comfortably after all the ncurses environment has been set up (by a separate thread), but in the artificial code, the thread updates are the first thing that get posted to the screen, even before ncurses has been set up – hence the crash.

If I step through the code then the ncurses thread runs ahead and sets up the screen before I hit the thread update code and again it works.

The solution? Use a condition variable and a mutex to ensure that nothing executes before the ncurses environment is fully established.

Not a big deal – but perhaps, at some point in the future someone struggling to understand why their code – which previously worked so well – has now stopped processing what seems to be well-formed input. Hope this helps!

Sometimes, admitting defeat is the very best thing you can do


English: Screenshot of GDB, the debugger of th...
English: Screenshot of GDB, the debugger of the GNU Project. . (Photo credit: Wikipedia)

I have spent the last month and half or so writing a particular computer program to model how some code would run on a 16 core “network-on-chip” device.

There were a lot of problems to get over – for although I had already written a Groovy/Java program to do the same thing, that code just was not up to handling the additional level of complexity I wanted for this iteration, so the whole thing had to be redone in C/C++.

About three weeks ago I got to the point where the code compiled and it should have been doing what I wanted, but it was crashing with memory bugs, a lot.

Debugging multi-threaded Posix Threads code is not meant to be easy but I was able – after teaching myself the basics of the GNU debugger (GDB) – to make a lot of progress (having realised that the old programmer’s saw of “if God had wanted us to use debuggers, she wouldn’t have given us printf” does not raise many laughs when you are dealing with multi-threaded code).

I discovered that the bugs were in my four year old red-black tree code. I had used it before in many situations and never had a problem, but also realised that where I used it before (and where I am still using it now), I was continually adding nodes to the tree and not deleting them. The deletion code was wrong.

Most people write their red-black tree code having read Introduction to Algorithms – but here, at least, I was not like most people. I had written the code myself more or less from scratch, piecing together the various things I needed to do from a lot of different online sources. And I was very proud of that too.

But that also meant that, when I tried to fix the code by looking at the canonical description in “Introduction…” the match was poor and my model was, at a quite fundamental level, different (I used null pointers for the leaves, not the “guard node” approach advocated in the book.)

I thought I could get around the issue by not doing any rebalancing after a deletion – after all I was still inserting nodes even as I was deleting them and my insertion code seems (and still seems) quite robust.

But while that allowed by code to run quite a lot longer before falling over, it still fell over. I had to face the facts and use the Standard Template Library (STL) classes – map and set.

I started work on that on Friday on a long train journey. I thought it would be weeks of further chopping and changing. In fact it took little more than 48 hours and I am wondering why I put myself through such misery when robust and well-tested solutions were always to hand (I kept my code for one tree I used where I am only doing insertions – as converting that would have taken a further day, but that was the margin I was operating in – having contemplated weeks, I now worried it would take more than a few hours).

The new code seems pretty robust, but I am struggling to get it on the university’s compute server as that is now under such heavy load, and so I am thinking of adding an nCurses interface while I wait.

Admitting defeat was the quickest way, it seems, to victory. Something worth remembering, perhaps, when your software project is in trouble.

Enhanced by Zemanta