Similarity, difference and compression


Perl
Perl (Photo credit: Wikipedia)

I am in York this week, being a student and preparing for the literature review seminar I am due to give on Friday – the first staging post on the PhD route, at which I have to persuade the department I have been serious about reading around my subject.

Today I went to a departmental seminar, presented by Professor Ulrike Hahne of Birkbeck College (and latterly of Cardiff University). She spoke on the nature of “similarity” – as is the nature of these things it was a quick rattle through a complex subject and if the summary that follows is inaccurate, then I am to blame and not Professor Hahne.

Professor Hahne is a psychologist but she has worked with computer scientists and so her seminar did cut into computer science issues. She began by stating that it was fair to say that all things are equally the same (or different) – in the sense that one can find an infinite number of things by which two things can be categorised in the same way (object A is weighs less that 1kg, object B weighs less than 1kg, they both weigh less than 2kgs and so on). I am not sure I accept this argument in its entirity – in what way is an object different from itself? But that’s a side issue, because her real point was that similarity and difference is a product of human cognition, which I can broadly accept.

So how do we measure similarity and difference? Well the “simplest” way is to measure the “distance” between two stimuli in the standard geometric way – this is how we measure the difference between colours in a colour space (about which more later) ie., the root of the sum of the squares of the distances. This concept has even been developed into the “universal law of generalisation”. This idea has achieved much but has major deficiencies.

Professor Hahne outlined some of the alternatives before describing her interest (and defence of) the idea that the key to difference was the number of mental transformations required to change one thing from another – for instance, how different is a square from a triangle? Two transformations are required, first to think of the triangle and then to replace the square with the triangle and so on.

In a more sophisticated way, the issue is the Kolmogorov complexity of the transformation. The shorter the program we can write to make the transformation, the more similar the objects are.

This, it strikes me, has an important application in computer science, or it least it could have. To go back to the colour space issue again – when I wrote the Perl module Image::Pngslimmer I had to write a lot of code that computed geometrical distances between colour points – a task that Perl is very poor at, maths is slow there. This was to implement the so-called “median cut” algorithm (pleased to say that the Wikipedia article on the median cut algorithm cites my code as an example, and it wasn’t even me who edited it to that, at least as far as I can remember!) where colours are quantised to those at the centre of “median cut boxes” in the colour space. Perhaps there is a much simpler way to make this transformation and so more quickly compress the PNG?

I asked Professor Hahne about this and she confirmed that her collaborator Professor Nick Chater of Warwick University is interested in this very question. When I have got this week out the way I may have a look at his published papers and see if there is anything interesting there.