The Crimea, Urdu and MD5 hashes: a partial reply to Professor Rubin



The charge of the gallant three hundred, the Heavy Brigade!
Down the hill, down the hill, thousands of Russians,
Thousands of horsemen, drew to the valley–and stay’d;
For Scarlett and Scarlett’s three hundred were riding by
When the points of the Russian lances arose in the sky;
And he call’d, ‘Left wheel into line!’ and they wheel’d and obey’d.
Then he look’d at the host that had halted he knew not why,
And he turn’d half round, and he bade his trumpeter sound
To the charge, and he rode on ahead, as he waved his blade
To the gallant three hundred whose glory will never die–
‘Follow,’ and up the hill, up the hill, up the hill,
Follow’d the Heavy Brigade.

is the first stanza of Tennyson’s Charge of the Heavy Brigade.


بہادر تین سو کے انچارج، ہیوی بریگیڈ!
پہاڑ کے نیچے، پہاڑی کے نیچے، ہزاروں روسی
سوار ہزاروں کی تعداد میں متوجہ وادی اور رہے؛
کے لئے سرخ اور سرخ کے تین سو کی طرف سے سوار تھے
روسی بالا کے پوائنٹس جب آسمان میں پیدا؛
اور اس نے فون کیا، “لائن میں بائیں پہیا! ‘اور وہ پہیوں اور اطاعت ہے.
پھر وہ میزبان ہے کہ روک دیا تھا اور وہ جانتا تھا کہ کیوں نہیں دیکھا،
اور وہ نصف دور کر دیا، اور اس نے اپنی بگل کھلاڑی آواز بڑے
چارج کرنے کے لئے، اور وہ پر آگے سوار، کے طور پر اس نے اس کے بلیڈ لہرایا
بہادر تین سو جن کے جلال کبھی نہیں گا مرنا
‘، عمل کریں اور پہاڑی کے اوپر اوپر پہاڑی، پہاڑ اپ،
بھاری بریگیڈ کے بعد.

is an undoubtedly very poor Urdu translation (via Google translate) of the same.

While this:


is the MD5 hash of the English verse.

So which – the Urdu or the MD5 hash – is most similar to the English verse? Possibly you might answer neither – particularly if you can read any Urdu. But I think most would agree that the Urdu lines are more similar than the hexadecimal string.

Therefore, asserts Professor Paul Rubin, in response to my account yesterday of Professor Ulrike Hahne’s seminar on difference and similarity, the argument that difference is related to the Kolmogorov complexity of converting one object/stimulus to another falls flat on its face. As computing an MD5 hash is less computationally difficult than translating a longish piece of verse. It’s a persuasive argument and I am not going to challenge it directly.

Yet there is another aspect to this. For the MD5 hash is undoubtedly both more different from the English verse than the Urdu, and it is also more difficult to convert the hash to the verse. Indeed, there is no real way to computationally convert such hashes back to the original. Of course arguably the verse is one solution to the reversal, but there are (presumably, I am not really familiar with MD5 hashing beyond the most basic idea) an infinite number of such solutions. To ‘computationally’ go from the hash to the verse we would have to write a programme that wrote out the characters of the poem and discard the hash altogether.

This relies on an important aspect of difference – it is not generally commutative, ie., if we took \Delta to be the ‘difference’ operator then A \Delta B \neq B \Delta A for all cases. Or if you want a word experiment that makes the same point “a line at 85 degrees to the horizontal is almost vertical, but a vertical line is not almost 85 degrees to the horizontal” in the minds of most people.

So, the Kolmogorov complexity theory still has some legs to stand on.

5 thoughts on “The Crimea, Urdu and MD5 hashes: a partial reply to Professor Rubin

  1. I was going to say that the argument for K complexity didn’t really fall flat on its face, but your (correct) observation about difference may doom it. Mathematical difference is indeed typically not commutative, but linguisticly “difference” typically is. Certainly “similarity” is generally a commutative relation. So if K difference is not commutative, it’s on shaky ground as a similarity measure.

    • Oh, well you differ radically from Professor Hahne here – her argument was founded on the experimentally backed observation that “linguistic” (or more accurately, cognitive) difference was not generally commutative. Of course in some cases it is: a square and triangle are as different for humans in either direction. But many other things are not – eg., the line example above.

    • One further thought about commutivity and difference: humans are quite sensitive the the second law of the thermodynamics in their observations on difference – ie they see how a system with low entropy is similar to a system with higher entropy but not the other way round – at least this is what Professor Hahne stated (though not quite in these terms) and I can instinctively see that makes sense.

      • The thought struck me that the K difference seems to be independent of context. Suppose I pointed out a blonde and a redhead to you, seated in a pub in Dublin, and asked if they were similar. You might say no. Repeat the experiment in a village in the Amazon basin, or in rural Swaziland, and you might say yes. But the length of a program to convert the blonde to a redhead (or vice versa) would not change.

  2. Ah, but the issue here is not the length of the program to (genetically) convert one to the other, but the length of the program to allow me to differentiate the two hair types. Though mainly your comment has set off a desire to be on Baggot Street right now.

Comments are closed.