Similarity, difference and compression


Perl
Perl (Photo credit: Wikipedia)

I am in York this week, being a student and preparing for the literature review seminar I am due to give on Friday – the first staging post on the PhD route, at which I have to persuade the department I have been serious about reading around my subject.

Today I went to a departmental seminar, presented by Professor Ulrike Hahne of Birkbeck College (and latterly of Cardiff University). She spoke on the nature of “similarity” – as is the nature of these things it was a quick rattle through a complex subject and if the summary that follows is inaccurate, then I am to blame and not Professor Hahne.

Professor Hahne is a psychologist but she has worked with computer scientists and so her seminar did cut into computer science issues. She began by stating that it was fair to say that all things are equally the same (or different) – in the sense that one can find an infinite number of things by which two things can be categorised in the same way (object A is weighs less that 1kg, object B weighs less than 1kg, they both weigh less than 2kgs and so on). I am not sure I accept this argument in its entirity – in what way is an object different from itself? But that’s a side issue, because her real point was that similarity and difference is a product of human cognition, which I can broadly accept.

So how do we measure similarity and difference? Well the “simplest” way is to measure the “distance” between two stimuli in the standard geometric way – this is how we measure the difference between colours in a colour space (about which more later) ie., the root of the sum of the squares of the distances. This concept has even been developed into the “universal law of generalisation”. This idea has achieved much but has major deficiencies.

Professor Hahne outlined some of the alternatives before describing her interest (and defence of) the idea that the key to difference was the number of mental transformations required to change one thing from another – for instance, how different is a square from a triangle? Two transformations are required, first to think of the triangle and then to replace the square with the triangle and so on.

In a more sophisticated way, the issue is the Kolmogorov complexity of the transformation. The shorter the program we can write to make the transformation, the more similar the objects are.

This, it strikes me, has an important application in computer science, or it least it could have. To go back to the colour space issue again – when I wrote the Perl module Image::Pngslimmer I had to write a lot of code that computed geometrical distances between colour points – a task that Perl is very poor at, maths is slow there. This was to implement the so-called “median cut” algorithm (pleased to say that the Wikipedia article on the median cut algorithm cites my code as an example, and it wasn’t even me who edited it to that, at least as far as I can remember!) where colours are quantised to those at the centre of “median cut boxes” in the colour space. Perhaps there is a much simpler way to make this transformation and so more quickly compress the PNG?

I asked Professor Hahne about this and she confirmed that her collaborator Professor Nick Chater of Warwick University is interested in this very question. When I have got this week out the way I may have a look at his published papers and see if there is anything interesting there.

 

Smaller PNG graphic files from your LaTeX


I use LaTeX (LyX as the front end) to generate the occasional bits of mathematical script that appear here. Is it bad to admit it gives me a small thrill to see how the LaTex is transformed into something mathematical – makes me feel like a proper scientist/mathematician. Well, I’ve done it now…

WordPress.com will not display postscript natively (or if it can I have never been able to get it to work), so I usually open the .ps files in GIMP and convert them to a PNG – PNGs being optimised to handle things like line drawings and graphical images of text.

The PNGs that come out of this are of a reasonable size but they could be smaller. Some years ago I wrote some Perl code to do that (I needed something that worked in a CGI environment and while Perl is not so fast for even integer maths it was the best option).

That code is available on CPAN as Image::Pngslimmer and on a typical Linux/BSD (OSX too?) install (I am assuming you have perl and the cpan module installed) you should be able to get it on to your box with sudo cpan Image::Pngslimmer – which may ask you a lot of questions before it installs if you have never used CPAN but while YMMV the default answers should probably work.

You can write your own script or just use this quick and dirty one for PNGs of 64K or less:

#!/usr/bin/perl -w
use Image::Pngslimmer();
my $srcfile = shift;
open INF, $srcfile;
binmode INF;
read(INF, $buffer, 65535);
my $blob2 = Image::Pngslimmer::indexcolours($buffer);
my $blob3 = Image::Pngslimmer::discard_noncritical($blob2);
print Image::Pngslimmer::zlibshrink($blob3);

To use this copy the script above into your editor and save it with a name eg shrinker.pl and then execute like this: perl shrinker.pl some.png > smaller.png which will create a new file smaller.png from some.png (which is untouched).

The results can be reasonably impressive. This file – created by cutting from the GIMP – comes in at 14.4KB (NB the graphics you see here are preprocessed by wordpress.com, click on the graphics to see the original):

Proof graphic, uncompressed

But after a bit of processing with the above script, this comes in at 9.7KB:

Compressed proof graphic

Finally, please do not say this doesn’t matter because your pipe is so fat. Millions of these things are being pumped through the internet every minute and the collective saving of bandwidth really would matter if we could deliver it…