Reviving this blog … with a question about binary trees

Work changes and a determination to actually finish my PhD mean I really should make a bit of an effort here and so I will.

Here is a puzzle that has been bothering me about binary trees which has come from my PhD research…

In that research I am investigating how to implement virtual memory on a many-core Network-on-Chip (NoC) system. Essentially I have been building and running emulators.

The core of the emulation is the “memory tree” – the connection between the cores of the NoC and the global memory controller. The “Bluetree” memory tree is a binary tree arrangement, with the root node (level 0) connecting to the memory controller and the leaves at the NoC tiles (with each tile having a single processor).

At every non-leaf level of the tree there are multiplexors (mux) with a buffer that can hold a single memory packet – so if two requests for global memory arrive at once and the buffer is free there needs to be an arbitration process to decide which packet gets to occupy the buffer and which has to wait.

We have 128 leaf nodes – as in this tree…

Binary tree with 128 leaf nodes

With this many leaf nodes, the question of the arbitration policy of the muxes is important. A naïve approach would be to say, if two packets contest for the buffer, let the packet on the left win the first time and then pass the packet on the right: after all this guarantees no starvation.

But it is actually a hopeless policy … the leftmost leaf node (shall number from zero) faces no opposition, the rightmost (number 127) loses out to every other node.

But number of leaf node alone is not a sufficient way of describing the relative number of blocks – for instance leaf node 16 appears to be in a better place than leaf node 15 – as going up the tree leaf node 15 can be blocked at level 6 (the root node is at level 0), at level 5, level 4, level 3 and level 2: while leaf node 16 is only blocked at level 1.

Practical testing of this naïvely fair approach in the emulator gives results like the following… as can be seen the right hand part of the tree performs appallingly but even there lots of variation can be seen:

some test results

My question is this: what is the correct (i.e., correct mathematically) way of describing the order of the leaf nodes?


Delays in a binary memory tree queue

This is a bit niche, but I spent a fair bit of this weekend working out what the algorithm to calculate how long a memory request packet would take to traverse a binary tree (from a leaf to the root) was. And now I have written a Groovy script (testable in your browser at – click on ‘edit’ and then ‘run’) to calculate the results.

(I am sure that this has been done before – many times – but I couldn’t find the algorithm on a quick search and then the desire to solve this myself took over).

The problem is this: memory request packets enter the tree at a leaf, they then have to cross a series of multiplexors until they reach the root (which is where the tree interfaces with the memory supply). Each multiplexor (mux) has two inputs and one output, so taken together the muxes form a binary tree.

As request packets could arrive at the leaves of a mux at the same time, there has to be a decision procedure about which packet progresses and which is delayed. The simple choice is to favour packets on either the left or the right leaf (in my case I went with the right). The question is then what is the average and maximum delay for a request packet.

So, if you muck about with the script you’ll see that the absolute maximum delay on a 256 leaf tree (eg., on a Bluetree memory tree connected to a 256 tile NoC) is 495 cycles, while the average maximum delay (ie for a leaf in the middle of the tree) is 248 cycles.

In contrast if the load is just 1% then these figures fall to about 3 and 1.5.

There is a flaw here though – because this all assumes that no new packets enter the system while the original request packet is traversing it – but in a dynamic system, even with a load as low as 1%, this is highly unlikely – packets that enter the tree “to the right” of the original request could potentially add to the delay if they chain with packets already present.

Plunging deeper

English: Example red-black tree
English: Example red-black tree (Photo credit: Wikipedia)

For the last month I have been working hard on some C/C++ code to simulate a 16 core computer.

I had already got some code that did this – written in Groovy – but the limitations of the Java VM made it just too difficult to write efficient code to do what I really wanted – which was to simulate the performance of such a 16 core device using Belady’s OPT (optimal – or so-called “clairvoyant”) page replacement policy. The groovy code was going to take two years just to generate the memory reference strings – never mind how quickly it might “execute” the code.

Using C++ (with a C interface as I am just a little bit more comfortable in that language) and some code I wrote when doing my MSc to generate a red-black tree (a semi-balanced binary tree) I was able to build some code that generates the reference strings in a few hours: it’s enough to restore your faith in traditional compiled, natively-executing, code.

But building the simulator to handle the reference strings is another matter. Last weekend, up in York, I was able to write many hundreds of lines and make quite a bit of progress, but nothing was really working.

A week later, and some crash self-tutoring in using the GDB debugger I now have got somewhere, but now have to confront a deeper problem – my red-black tree code is fundamentally broken somewhere.

I wrote this code in the summer of 2010 because I had had such a bad time of it on the C++ course at Birkbeck (my own fault, I didn’t pay enough attention). For the close-to-four-years I have looked on it as something of a coding pinnacle: and certainly it does quite efficiently generate a templatised red-black tree if you are just sticking new values in one after another. But it would appear (I am not quite sure if this is correct – but the evidence points that way) to be quite a bit broken if you subsequently start deleting some nodes in the tree and adding new nodes in: the crashes I am seeing suggest my code creates a loop – the left and right branches pointing back to the parent.

So, the work must go on and I will have to pore over some four year old code (not brilliantly documented) to fix it.

Enhanced by Zemanta

The Reingold-Tilford algorithm revisited

hw5_ivmooc_rt_tree (Photo credit: marianocecowski)

A while ago, as part of early research into what became my MSc project, I wrote code to create and then draw red-black trees using C++.

To draw the trees I used the venerable Reingold-Tilford algorithm, which is more or less the standard approach. I wrote some blogs about it and pages here seem to come pretty high up in Google searches for the algorithm, so I get passing traffic regularly as a result.

But idly chasing these links has led me to a chapter from the forthcoming Handbook of Graph Drawing and Visualization edited by Roberto Tamassia which has a chapter on tree drawing by Adrian Rusu, which contains bad news for us Reingold-Tilford fan boys, as this summary from the book of an experiment comparing algorithmic performance shows (emphasis added):

• The performance of a drawing algorithm on a tree-type is not a good predictor of the performance of the same algorithm on other tree-types: some of the algorithms perform best on a tree-type, and worst on other tree-types.
Reingold-Tilford algorithm [RT81] scores worse in comparison to the other chosen algorithms for almost all ten aesthetics considered.
• The intuition that low average edge length and area go together is contradicted in only one case.
• The intuitions that average edge length and maximum edge length, uniform edge length and total edge length, and short maximum edge length and close farthest leaf go together are contradicted for unbalanced binary trees.
• With regards to area, of the four algorithms studied, three perform best on different types of trees.
• With regards to aspect ratio, of the four algorithms studied, three perform well on trees of different types and sizes.
• Not all algorithms studied perform best on complete binary trees even though they have one of the simplest tree structures.
The level-based algorithm of Reingold-Tilford [RT81] produces much worse aspect ratios than algorithms designed using other approaches.
• The path-based algorithm of Chan et al. [CGKT02] tends to construct drawings with better area at the expense of worse aspect ratio.

Red-black trees

Example red-black tree
Image via Wikipedia

Binary trees are seen and used frequently in computing science and computing. They are a good abstraction for many naturally occurring relationships (most of our mathematics is based on binary operations, for instance) and have O(log n) complexity (ie if you went from searching a tree of 1000 elements to a tree of 100,000 elements then the search should not take 100 times longer but about 10.)

Of course that log n goodness requires the tree to be “balanced” ie for any given node there should be roughly equal numbers to the left and the right. One way of doing this is through a “red black tree” – here nodes in the tree are assigned a colour: the root is always black. The rule is that any traversal from the root to the leaves should always go through an equal number of black nodes and to ensure this is possible red nodes may be inserted in the tree, but no red node may have another red node as an immediate descendant. (A full explanation is in Introduction to Algorithms though one can also work off various explanations on the internet, though they tend to be less than complete.)

The Linux kernel natively implements a red black tree (in C) and the first bit of work I started on my MSc project was to write my own, userland, implementation so I could see processes in the same way the kernel did.

As I had got a less than glorious mark (still a pass, so that’s what counts) in the C++ exam last year I also decided that I would write this in C++ using templates. A few days after I started I discovered that actually the writers of the STL had got there before me, but as this was an academic exercise I ploughed on with my own implementation.

Essentially this is what I did on my two week summer holiday in Scotland last year! When I was there I also started (though completed when I got home) a couple of helper applications to position the tree according to Reingold and Tilford’s algorithm (which I had to translate from PASCAL) for “better drawing of trees” and a Qt application to display it all.

In fact I had a nagging problem with the Reingold-Tilford algorithm which I finally got around to fixing last night.

LaTeX to PNG representation of processes

(Interestingly the code also allows you to use the Turing-complete capabilities of LaTeX by specifying a TeX output that uses LaTeX’s own positioning algorithm – something I picked up from The LATEX Graphics Companion – that is what the example shown above uses, though unfortunately for even moderately loaded systems the LaTeX processor objects to the width of the output).

Fancy trying it? I could do with someone giving it a bash on a BSD system – not needed for my course but interesting none the less.

The code is all at GitHub memball gives the basic GraphML or TeX or plaintext output, treedraw will convert the GraphML to an SVG or serialiszed stream using Reingold and Tilford’s algorithm and treeqt will use Qt to display the tree using the serialized class. You may have to download various libraries to get it to work (certainly the libproc-dev package on Ubuntu/Debian) – I couldn’t get memball to work on a Windows machine using Cygwin but maybe even that is fixable.

There is a script in the treeqt repo to make it easier: download the sources form all three repos, build them and then run:

./setup | ./treeqt --r 1