A few years ago, on my Computer Science MSc, there was something of a mini-revolt as some of the students – already working as developers – complained that the object-orientated design course was being taught in Groovy – a JVM-based language that, in effect, is a dynamic extension of static Java. They said there were no jobs in Groovy so why were we being taught that?
I wasn’t one of them. I wasn’t (and I am not) working as a developer and so Groovy, which crossed the boundaries between Java’s imperative and Scala‘s functional approaches was interesting and powerfully expressive. But, yes, it was a bit of a niche.
I have come back to Groovy now because, for my PhD, I want to write a more powerful and dynamic NoC simulation than has proved possible in C++. Groovy has the tools – especially closures – that allow the writing of good DSLs and so was a natural choice.
But the Groovy landscape seems barren. As I write I haven’t been able to make any progress on my code because it seems a Java update broke Groovy support and, because the infrastructure for Groovy support through http://groovy-lang.org appears to have collapsed.
Though, lately, I have (re-)discovered the joys of C++ – even if I do write code in that language like a C programmer.
In the last six months I have written a lot of code designed to process a lot of data. I started off with Groovy – my data was XML and Groovy comes with tools that are specifically designed to make a programmer’s life a lot easier when handling XML.
But, in the end, Groovy just was not up to the task – because for my needs: processing 200 GB of XML, Groovy (or, rather, the Java VM on which it runs) is just not fast or flexible enough.
My response to that was to go “route 1” and reach for C. I know and knew I could build what I wanted in C and that it would be as fast as I could reasonably hope to get it (going down to Assembly was just not a serious option).
However, the code I am actually using is in C++. I found that the abstractions offered by the STL were worth the price in terms of speed of execution and code that is somewhat opaque to the GDB debugger.
It’s a compromise, of course, but I suspect if I had tried to write the equivalent of the STL map and set classes in C, I’d still be debugging my implementation many weeks later – after all I found that my C++ red-black tree code was actually broken despite using it (successfully) for a number of years.
Real programmers have to make these compromises – and C++ struck the right note between language expressiveness and language power for me.
Last year I was taught “Object Orientated Design and Programming” as part of my Birkbeck MSc, using Groovy, a dynamic functional language built on top of Java and running on the Java VM.
I enjoyed it and liked Groovy – I went on to write some pieces of software for my MSc project using it.
But it also gave the impression of being a dying language and there were some complaints from fellow students who thought C# or Java itself would have been a better bet for them jobs wise (to which one of the lecturers responded with admirable chutzpah with a suggestion of using Lisp in the future).
This last week I have again been dabbling in Groovy and I get a sense that the language is suddenly back in fashion and its community of users seems more energy charged than a year ago.
Nothing scientific to back that feeling up with, just my judgement.
My, expensive, and meant-to-be-fast server became available today and while I would not hesitate to recommend Hetzner – the people I ordered it from, the performance is disappointing.
I don’t think the issue is the box itself – it has 12 cores and 25 GB of RAM. Instead the issue seems to be the code – and I think that means the Open JVM (openjdk jre). If I specify 12 threads to run then performance falls to about 500% (ie the equiavlent of about five cores), if I specify 7 or 8 threads I seem to get peak performance at 600 – 700%. In other words I cannot get out of over a quarter of the silicon.
I think I need to install the Sun/Oracle JVM and see how that runs.
Unfortunately, I do not have time to investigate this further myself, but others may do.
But yesterday I had a serious performance issue with the (open) JVM – though I was able to solve it with an algorithm change – swapping the problematic (integer) code for a lot of floating point maths: not the usual way to fix a performance issue but one that works.
My original code (in Groovy) appended many millions of integers to a list and then, once a loop was complete, calculated the average for the list (calculating the average working set size for a running process). When I was dealing with 2 – 3 million integers it worked well and performance, if anot exactly zipping along, was good. Push that up to 10 – 11 million and the first couple of times through the loop CPU utilisation dropped precepitatively (this was multithreaded – with runs through the loop operating in parallel) but the code was still visibly working but after that the intervals between loop completion grew to the point that the code seemed to have failed.
Even when I pre-allocated 0x1000000 items in what I now explicitly declared as an ArrayList the performance was little better – the first couple of iterations seemed a bit faster but performance then died.
I do not know what is going on – though excessive memory fragmentation perhaps coupled with poor garbage collection seem like the obvious answers: seems there is probably a brick wall for ArrayList size that sees whatever memory allocation algorithm operates inside the JVM fall over.
How did I fix it? Update the average in real time – in pseudo code below: