I love binary trees


A few years ago I was very proud of myself for managing to translate the original Pascal of the Reingold-Tilford algorithm into C/C++ and every since then I have had a fascination for drawing binary trees.

NoC with binary memory tree

 

The one drawn here was actually done “by hand” in Metapost – I realised only as I finished it that I could have or maybe should used an algorithmic approach and not a heuristic (“does it look ok?”), though Metapost is a pretty inflexible programming tool (though as this book shows can produce some beautiful results) and I reckon a decent tree drawing algorithm in that ought to be worth at least an MSc.

The Reingold-Tilford algorithm revisited


hw5_ivmooc_rt_tree
hw5_ivmooc_rt_tree (Photo credit: marianocecowski)

A while ago, as part of early research into what became my MSc project, I wrote code to create and then draw red-black trees using C++.

To draw the trees I used the venerable Reingold-Tilford algorithm, which is more or less the standard approach. I wrote some blogs about it and pages here seem to come pretty high up in Google searches for the algorithm, so I get passing traffic regularly as a result.

But idly chasing these links has led me to a chapter from the forthcoming Handbook of Graph Drawing and Visualization edited by Roberto Tamassia which has a chapter on tree drawing by Adrian Rusu, which contains bad news for us Reingold-Tilford fan boys, as this summary from the book of an experiment comparing algorithmic performance shows (emphasis added):

• The performance of a drawing algorithm on a tree-type is not a good predictor of the performance of the same algorithm on other tree-types: some of the algorithms perform best on a tree-type, and worst on other tree-types.
Reingold-Tilford algorithm [RT81] scores worse in comparison to the other chosen algorithms for almost all ten aesthetics considered.
• The intuition that low average edge length and area go together is contradicted in only one case.
• The intuitions that average edge length and maximum edge length, uniform edge length and total edge length, and short maximum edge length and close farthest leaf go together are contradicted for unbalanced binary trees.
• With regards to area, of the four algorithms studied, three perform best on different types of trees.
• With regards to aspect ratio, of the four algorithms studied, three perform well on trees of different types and sizes.
• Not all algorithms studied perform best on complete binary trees even though they have one of the simplest tree structures.
The level-based algorithm of Reingold-Tilford [RT81] produces much worse aspect ratios than algorithms designed using other approaches.
• The path-based algorithm of Chan et al. [CGKT02] tends to construct drawings with better area at the expense of worse aspect ratio.

Red-Black tree generated by memball and treedraw


 

Red-Black Tree

This is a (much reduced in scale) red-black tree of processes running on my main desktop, ordered by allocated memory (produced by my memball and treedraw programs – so the tree is structured using the Reingold-Tilford algorithm).

If you count you will see that the path from the root (the little black ball at the top) to the leaves  – even smaller black dots at the bottom – always passes through five intervening black balls.

The graphic is a PNG produced from the very large (in scale) SVG. This is the far (highest memory use) end:

Biggest memory usersAnd here are the smallest memory users:

Smallest memory users

Red-black trees


Example red-black tree
Image via Wikipedia

Binary trees are seen and used frequently in computing science and computing. They are a good abstraction for many naturally occurring relationships (most of our mathematics is based on binary operations, for instance) and have O(log n) complexity (ie if you went from searching a tree of 1000 elements to a tree of 100,000 elements then the search should not take 100 times longer but about 10.)

Of course that log n goodness requires the tree to be “balanced” ie for any given node there should be roughly equal numbers to the left and the right. One way of doing this is through a “red black tree” – here nodes in the tree are assigned a colour: the root is always black. The rule is that any traversal from the root to the leaves should always go through an equal number of black nodes and to ensure this is possible red nodes may be inserted in the tree, but no red node may have another red node as an immediate descendant. (A full explanation is in Introduction to Algorithms though one can also work off various explanations on the internet, though they tend to be less than complete.)

The Linux kernel natively implements a red black tree (in C) and the first bit of work I started on my MSc project was to write my own, userland, implementation so I could see processes in the same way the kernel did.

As I had got a less than glorious mark (still a pass, so that’s what counts) in the C++ exam last year I also decided that I would write this in C++ using templates. A few days after I started I discovered that actually the writers of the STL had got there before me, but as this was an academic exercise I ploughed on with my own implementation.

Essentially this is what I did on my two week summer holiday in Scotland last year! When I was there I also started (though completed when I got home) a couple of helper applications to position the tree according to Reingold and Tilford’s algorithm (which I had to translate from PASCAL) for “better drawing of trees” and a Qt application to display it all.

In fact I had a nagging problem with the Reingold-Tilford algorithm which I finally got around to fixing last night.


LaTeX to PNG representation of processes

(Interestingly the code also allows you to use the Turing-complete capabilities of LaTeX by specifying a TeX output that uses LaTeX’s own positioning algorithm – something I picked up from The LATEX Graphics Companion – that is what the example shown above uses, though unfortunately for even moderately loaded systems the LaTeX processor objects to the width of the output).

Fancy trying it? I could do with someone giving it a bash on a BSD system – not needed for my course but interesting none the less.

The code is all at GitHubhttp://github.com/mcmenaminadrian: memball gives the basic GraphML or TeX or plaintext output, treedraw will convert the GraphML to an SVG or serialiszed stream using Reingold and Tilford’s algorithm and treeqt will use Qt to display the tree using the serialized class. You may have to download various libraries to get it to work (certainly the libproc-dev package on Ubuntu/Debian) – I couldn’t get memball to work on a Windows machine using Cygwin but maybe even that is fixable.

There is a script in the treeqt repo to make it easier: download the sources form all three repos, build them and then run:

./setup | ./treeqt --r 1