Passing by reference in C++ – a downside

Code (Photo credit: Riebart)

Once you realise what is happening this is obvious, but it took me a while…

I wanted to write some longs out as character strings in C++, so wrote some code like this:

void writeLongToFile(ofstream& xmlFile, long& value)
stringstream stringy;
stringy << value;
xmlFile << stringy.rdbuf();

But compile time error after compile timer error followed as I was told that I could not use this code with unsigned long or with const long and so on … but surely these are simple conversions I thought…

…Just me thinking like a C programmer again. Passing by reference is just that – there is no copying or conversion available and so long and unsigned long are fundamentally incompatible.

The solution? Just pass by value – afterall the saving in passing a more or less primative type like long by reference must be pretty minimal – passing the stream by reference is the real saving here (actually, it’s also practical – as we want the output to go to the real stream and not a copy: in C we’d use a pointer here).

Enhanced by Zemanta

Plunging deeper

English: Example red-black tree
English: Example red-black tree (Photo credit: Wikipedia)

For the last month I have been working hard on some C/C++ code to simulate a 16 core computer.

I had already got some code that did this – written in Groovy – but the limitations of the Java VM made it just too difficult to write efficient code to do what I really wanted – which was to simulate the performance of such a 16 core device using Belady’s OPT (optimal – or so-called “clairvoyant”) page replacement policy. The groovy code was going to take two years just to generate the memory reference strings – never mind how quickly it might “execute” the code.

Using C++ (with a C interface as I am just a little bit more comfortable in that language) and some code I wrote when doing my MSc to generate a red-black tree (a semi-balanced binary tree) I was able to build some code that generates the reference strings in a few hours: it’s enough to restore your faith in traditional compiled, natively-executing, code.

But building the simulator to handle the reference strings is another matter. Last weekend, up in York, I was able to write many hundreds of lines and make quite a bit of progress, but nothing was really working.

A week later, and some crash self-tutoring in using the GDB debugger I now have got somewhere, but now have to confront a deeper problem – my red-black tree code is fundamentally broken somewhere.

I wrote this code in the summer of 2010 because I had had such a bad time of it on the C++ course at Birkbeck (my own fault, I didn’t pay enough attention). For the close-to-four-years I have looked on it as something of a coding pinnacle: and certainly it does quite efficiently generate a templatised red-black tree if you are just sticking new values in one after another. But it would appear (I am not quite sure if this is correct – but the evidence points that way) to be quite a bit broken if you subsequently start deleting some nodes in the tree and adding new nodes in: the crashes I am seeing suggest my code creates a loop – the left and right branches pointing back to the parent.

So, the work must go on and I will have to pore over some four year old code (not brilliantly documented) to fix it.

Enhanced by Zemanta

Code Club first session

Code Club logoAt last managed to lead my first “Code Club” session – it had a slightly chaotic start as none of the computers we were using had Scratch installed and nor did we have access to a login that allowed us to install Scratch in the Windows “Programs” directory – but once we worked around that we all had great fun.

From the start it was obvious that Scratch made sense to the kids – they immediately grasped that the endless loop control would set the actions it enclosed to run endlessly. Of course nobody (apart from Visual Basic users?) works with similar simple graphics tools when writing an industrial strength program, but that was not the point: this is about teaching loops, conditionals and branches and so on.

The lost time at the start meant it was all a bit hurried so I do not know how much of the programming the children took in – as opposed to just ensuring that their Scratch scripts matched those in the worksheet. But on the first time out – none of the children had used Scratch before – simply being able to manipulate the programming elements was probably more than enough.

In any case, all of them were hugely enthusiastic when I told them they could install Scratch on any computer they had at home and practise on it there.

Code Club feels like a huge success to me already.

The expressive power of BASIC

As I have been working on BINSIC – my reimplementation of BASIC as a domain specific language via Groovy– I have been increasingly struct by how unbelievably awful BASIC is (at least in the ZX80/ZX81 dialect that I am basing all this on).

basic coding
basic coding (Photo credit: Terry Freedman)

My memories of it are of a fun language where it was possible to quickly pull together some useful code to do the sorts of things 15, 16 and 17-year-old geeks were interested in.

But I really have to wonder now – it doesn’t even support user-defined functions: looking back I wonder why I wasn’t more enthused by PASCAL when I met it and its procedural programming paradigm at university: it certainly feels that I ought to have seen it as a fantastic improvement (though by then I was more into Z80 machine code than any high-level language).

But BASIC does have its strengths – as I have found out.

This piece of code is a prime example: 100 INPUT V

This means create the numeric variable called V and assign to it the value typed in by the user at the keyboard.

Trying to do this in Groovy/Java requires the creation of a whole new class just to handle the keyboard input, as well as mess about with thread synchronisation to ensure that the process waits for the input … what follows is just a part:

	def waitOnInput()
		def textIn = binsicEngine.binsicWindow.textIn
		def countDown = new CountDownLatch(1)
		def inputAction = new BinsicInputAction(textIn, binsicEngine.preProc,
		textIn.getActionMap().put(BinsicConstants.INPUT, inputAction)
		return inputAction.result


Is Groovy back in fashion?

Last year I was taught “Object Orientated Design and Programming” as part of my Birkbeck MSc, using Groovy, a dynamic functional language built on top of Java and running on the Java VM.

I enjoyed it and liked Groovy – I went on to write some pieces of software for my MSc project using it.

But it also gave the impression of being a dying language and there were some complaints from fellow students who thought C# or Java itself would have been a better bet for them jobs wise (to which one of the lecturers responded with admirable chutzpah with a suggestion of using Lisp in the future).

This last week I have again been dabbling in Groovy and I get a sense that the language is suddenly back in fashion and its community of users seems more energy charged than a year ago.

Nothing scientific to back that feeling up with, just my judgement.

An example of the poor editing in O’Reilly’s “Programming Android”

Image by 小宗宗 via Flickr

OK, I don’t really want to sound like I am bashing this book – Programming Android: Java Programming for the New Generation of Mobile Devices – because, by its very nature, writing a technical book must be highly demanding in terms of accuracy and I see no signs of any mistakes – just what I think is poor editing. See if you agree…

So, the book is discussing how to serialize classes using Android’s Parcelable interface, and makes this, correct point about serializing an enum type:

“Be sure, though, to think about future changes to data when picking the serialized representation. Certainly it would have been much easier in this example to represent state as an int whose value was obtained by calling state.ordinal. Doing so, however, would make it much harder to maintain forward compatibility for the object. Suppose it becomes necessary at some point to add a new state … this trivial change makes new versions of the object completely incompatible with earlier versions.”

But then discussing de-serialization, the book states, without comment:

“The idiomatic way to do this is to read each piece of state from the Parcel in the exact same order it was written in writeToParcel (again, this is important), and then to call a constructor with the unmarshaled [sic] state.”

Now, technically, these passages are not in disagreement – but it is clearly the case that the de/serialization process is highly coupled with the data design – something that ought to be pointed out, especially if we are going to make a big deal of it on the serialization phase.

My first R program

Having used Groovy (which makes the scripting environment feel familiar) and some Scheme (via Structure and Interpretation of Computer Programs), R does feel completely alien, but it still feels like a steep learning curve.

But here’s my short script –

unpatched <- read.csv("~/unpatched.txt")
unpatchcons <- transform(unpatched, realm=realm*60 + reals)
plot(size, realm, log="y")
abline(reg=linelog, untf=TRUE, col="blue",lty=3)

And here’s the graph (of Linux kernel compile times) it generates – the blue line is obviously a very bad fit!

Linux kernel compile times

Writing more code to avoid writing any of the report?

The C Programming Language
Image by mrbill via Flickr

I have managed to churn out 160 lines of working C today – which I think is quite good going, though, according to this, maybe I could have churned out 400 of C++ or even 960 of Perl (I love Perl but the mind boggles).

My program will tell you how pages pages are present or how many have been swapped (it has the ability to do a bit more too, but I have not exploited that even though it is essentially there) – you can fetch it from github here: valext (the name reflects the idea that this was going to be an extension to Valgrind but then I discovered/realised that was never going to work).

Anyway, what I now have to face up to is whether I am churning out code to avoid actually writing any of my MSc project report – I have a feeling that some of that is going on and think I need to start rereading Writing for Computer Science – which helped me greatly with the proposal.

Performance collapse in the Open JVM

Duke, the Java Mascot, in the waving pose. Duk...
Image via Wikipedia

Unfortunately, I do not have time to investigate this further myself, but others may do.

But yesterday I had a serious performance issue with the (open) JVM – though I was able to solve it with an algorithm change – swapping the problematic (integer) code for a lot of floating point maths: not the usual way to fix a performance issue but one that works.

My original code (in Groovy) appended many millions of integers to a list and then, once a loop was complete, calculated the average for the list (calculating the average working set size for a running process). When I was dealing with 2 – 3 million integers it worked well and performance, if anot exactly zipping along, was good. Push that up to 10 – 11 million and the first couple of times through the loop CPU utilisation dropped precepitatively (this was multithreaded – with runs through the loop operating in parallel) but the code was still visibly working but after that the intervals between loop completion grew to the point that the code seemed to have failed.

Even when I pre-allocated 0x1000000 items in what I now explicitly declared as an ArrayList the performance was little better – the first couple of iterations seemed a bit faster but performance then died.

I do not know what is going on – though excessive memory fragmentation perhaps coupled with poor garbage collection seem like the obvious answers: seems there is probably a brick wall for ArrayList size that sees whatever memory allocation algorithm operates inside the JVM fall over.

How did I fix it? Update the average in real time – in pseudo code below:

average = 0
previousInstructions = 0
loop [0, maxInstructions - 1)
if (change_in_working_set_size) {
average = ((average * previousInstructions) + workingSetSize() * (currentInstructions - previousInstructions))/currentInstructions
previousInstructions = currentInstructions

Like I say, floating point, but it works.

Java Performance Tuning

Eighteen million lines of XML!

Image via Wikipedia

I have just finished a Groovy (the language) helper application to process the output of Valgrind‘s “lackey” package into an XML file for further analysis. More proof of the power of Groovy (a lanuage I am coming to like) – but also a sign that maybe I should get a more powerful computer – the XML file that came from the loading of xterm ran to over 18 million lines.