My MSc project is ambitious, sometimes it seems too ambitious: to test various ways in which the Linux VMM and scheduler could be modified to improve performance under high levels of stress.
Books are an essential guide in computer science, perhaps the essential guide, and having a book that helped me get to grips with issues such as measuring the Linux kernel’s use of the VM subsystem and how to design tests of application behaviour and page faulting rates and so on is increasingly important.
I use LaTeX (LyX as the front end) to generate the occasional bits of mathematical script that appear here. Is it bad to admit it gives me a small thrill to see how the LaTex is transformed into something mathematical – makes me feel like a proper scientist/mathematician. Well, I’ve done it now…
WordPress.com will not display postscript natively (or if it can I have never been able to get it to work), so I usually open the .ps files in GIMP and convert them to a PNG – PNGs being optimised to handle things like line drawings and graphical images of text.
The PNGs that come out of this are of a reasonable size but they could be smaller. Some years ago I wrote some Perl code to do that (I needed something that worked in a CGI environment and while Perl is not so fast for even integer maths it was the best option).
That code is available on CPAN as Image::Pngslimmer and on a typical Linux/BSD (OSX too?) install (I am assuming you have perl and the cpan module installed) you should be able to get it on to your box with sudo cpan Image::Pngslimmer – which may ask you a lot of questions before it installs if you have never used CPAN but while YMMV the default answers should probably work.
You can write your own script or just use this quick and dirty one for PNGs of 64K or less:
my $srcfile = shift;
open INF, $srcfile;
read(INF, $buffer, 65535);
my $blob2 = Image::Pngslimmer::indexcolours($buffer);
my $blob3 = Image::Pngslimmer::discard_noncritical($blob2);
To use this copy the script above into your editor and save it with a name eg shrinker.pl and then execute like this: perl shrinker.pl some.png > smaller.png which will create a new file smaller.png from some.png (which is untouched).
The results can be reasonably impressive. This file – created by cutting from the GIMP – comes in at 14.4KB (NB the graphics you see here are preprocessed by wordpress.com, click on the graphics to see the original):
But after a bit of processing with the above script, this comes in at 9.7KB:
Finally, please do not say this doesn’t matter because your pipe is so fat. Millions of these things are being pumped through the internet every minute and the collective saving of bandwidth really would matter if we could deliver it…
A few weeks ago I wrote a blog with a joke title about the end of the world because I noticed that, on 2 June, an object would come closer than the moon.
Strangely, or maybe not so strangely, it is now getting hits from people seemingly convinced the Japanese earthquake means maybe we are in the “end of days”. So here are a few facts to think about:
Earthquakes regularly are the “largest ever recorded” because we have not been recording them for long
Despite all the horrific pictures from yesterday and the individual grief, for which people have my sympathy, the truly remarkable thing is that so few people have been killed because of the quality of the engineering (nuclear engineers may have a few questions to answer though)
Do not worry about 2009 BD – even if the scientists have got the calculations radically wrong (extremely unlikely) and it crashes into earth, the estimated yield will be about 7 kilotons of explosives – yesterday’s quake unleashed an estimated 336 megatons (or more) and we got through that
(Apparently Fibonacci, real name Leonardo Pisano Bigollo, used the sequence to model rabbit populations in the 13th century, though perhaps more importantly he essentially codified the western system of arabic numbers. The sequence was known by Indian mathematicians long before then though.)
But the sequence is perhaps most interesting for its relationship to the “golden ratio” (or golden mean), phi:
which means phi has the value:
This is the ratio where:
The interesting thing about the Fibonacci series for n is that each number in the series is the closest integer to:
A regex is, as any fule kno, a regular expression: a pretty fundamental concept in computing (the picture here shows an evaluation mechanism for one, albeit for a limited language) – computers, being deterministic, rely on “regular languages” for programs and so regexes are generally speaking widely used in parsing the languages.
But what has motivated me to write this blog (apart from the need to demote a story with the word “sexy” in the headline) is that they are the sort of thing that ought to be taught more widely than the low-quality stuff that seems to pass for ICT education in schools.
Because they link a computing fundamental to something that would be useful in everyday life. Think of it this way – you have a document which logs 20,000 library loans with a record tag and a 24 hr clock. You know about 500 of them are of interest to you because they record when, say, the system recorded “DVD borrowed” or “CD borrowed” (ie something like “DVD borrowed: 17:03:19”) – how can you extract just that?
A regex could do that in seconds – and while they can look complex, they are not that difficult to learn – at the risk of getting it all wrong (as I haven’t tested it) then the one for this could be (depending on the precise format):
Of course you have to have something with a regex engine available to crack that – though it’s actually quite a widely supported option in much free and proprietary software, it’s just nobody teaches you this hugely useful skill.
In it Professor Rubin lists three prominent actresses who are/were also scientists with serious achievements to their name and says that he already knew about one of them … Danica McKellar.
But even as I read it I realised that while I didn’t know about her, I did (vaguely) know about Hedy Lamarr – I think I may have read about her in Andrew Tannenbaum’s Computer Networks– which I read large chunks of earlier this academic year. As I recall her proposal on frequency-hopping was not taken seriously by the US Navy at the time (during the war), which proved to be a pretty big mistake. It’s now widely used in all sorts of wireless communications.
Anyway, I challenged Professor Rubin’s challenge to me about the potential brokenness of public key encryption if P = NP before I realised he was likely someone with the intellectual firepower to destroy me an instant – so I hope he will take this link back to his blog as a peace offering.
Plus, writing this is more interesting (what isn’t?) than constructing Unified Process use cases for my object orientated design and programming class…
Been an interesting week here on the blog … as I said before getting on Slashdot is great as you get access to the intellect of a lot of people, and that taught me quite a bit.
It’s not relevant to my course but I went to the (Maths shelves of) the Birkbeck library and withdrew The Language of Machines – one of the few books on computability in the library.
But before I get down to reading much of that I think I need to reply to some of the harsher remarks that were made about this blog on Slashdot – in particular the piece I wrote at the end of December called “What if P=NP?”
The opening line of that article is “I admit I now going slightly out of my depth” and it does contain a few errors, though nothing I think that justifies some of the really nasty things a couple of people wrote about it – but then again some people also insisted that proving P=NP would not break public key encryption so plainly there are people on /. whose ignorance is matched by the vehemence of their opinion, so maybe I shouldn’t let it get to me: especially as one of the critics insisted there were no algorithms to solve NP problems (so how are they solved? Fairy dust inside your computer?)
But I thought I’d try to rewrite my blog post in a better way….
I admit I now going slightly out of my depth, but I will try to explain what this is about and why it is interesting.
It is known that computers can solve some problems in what is called “polynomial time“: that is to say a finite time that is proportional to a polynomial of complexity of the input. The key thing is that these problems are computable using known mechanical steps (ie algorithms). Often that means they are solvable in a way that is (sometimes euphemistically) described as “quickly”.
These can be simple problems – like what is the sum of the first ten integers – or more complex ones, such as creating self-balancing trees, sorting database records alphabetically and so on.
These are the so-called “P” (for polynomial time class) problems. Here’s a definition:
A problem is assigned to the P (polynomial time) class if there exists at least one algorithm to solve that problem, such that the number of steps of the algorithm is bounded by a polynomial in , where is the length of the input.
Then there are another class of problems which seem fiendishly difficult to solve but which it is relatively simple to prove the correctness of any solution offered. These problems can also be solved (computed) in a finite time – and they can also be computed by a Turing machine (a simple model of a computer) and so an algorithmic solution exists. It is just that one cannot tell in advance what that algorithm is. In the worst case the only way – it is thought – that a solution can be found is through an exhaustive search of all algorithms – in other words a form of “brute force“. These are the NP (non deterministic polynomial time class) problems.
Now most mathematicians think that NP does not equal P – ie there are no “simple” solutions to the NP problems – and that may or may not be a good thing as much of our internet commerce relies on encryption which is thought to be an NP problem and effectively impervious (due to the time it would take to factor the very large prime numbers at the heart of the process) to brute force attacks if properly done.
(In Afghanistan in 2001 US cryptanalysts seemingly brute forced a Taliban Windows NT box but it used much weaker encryption than most electronic commerce.)
But what if it were the case that all seemingly NP problems were actually P problems? There are a lot of people studying this issue – but according to the New Scientist (their Christmas edition, the first in my subscription and delivered this morning, inspired me to write this) we should expect to wait at least until 2024 for an answer (by which point the problem – first formulated in 1971 – will have reached the age at which 50% of major mathematical problems will have been solved).
This all matters: unlike, say, Fermat’s last theorem, proving P = NP is likely to have more or less immediate effects on the lives of many of us (how would you feel if you knew that it was possible, even if not likely, that someone had developed a way to quickly crack open all your internet credit card transactions?)
Of course, proving P = NP for all problems does not necessarily mean we will immediately have determined polynomial time based solutions for all the current NP problems out there. But we should not expect that to last for long, though it may be the case that the algorithms may be exceptionally complex rendering them very slow even on the fastest computing equipment, but if history teaches us anything it is that computers get faster (yes, I know there are bounds on that).
And, actually, I think the benefits to humanity would be enormous. The most likely immediate effects would be in improvements in computing/operating system efficiency. fast computers for less money and/or less energy consumption would be a huge benefit. From that alone many other benefits will flow in terms of information availability and universality.
But there could be many, many others in medicine and linguistics and even in getting the trains to run on time!
I confess that one of my biggest fears has been that any work I do on my MSc project will be lost in some sort of catastrophic computer disk crash.
Git – and the various free services available, such as github, offered a good way of backing the work up. But given that the project is based on the Linux kernel which is bigger than the official maximum size of a github project, it looked less than hopeful.
(That was not helped by their support staff – one of whom appeared to tell me I should delete the binaries in the kernel and I should be ok.)
So I have gone for unfuddle. It’s not free – $24 a month for a 2 GB repo – but it should do the job with no problem.
Actually, its unlikely, at least in the short term.
I discovered today that the Jet Propulsion Library run an easy to use website on objects that come close to Earth – and then went on to discover that it was predicting an object, travelling at 2000 metres per second relative to the Earth was due to come inside the Moon’s orbit on 2 June.
A quick, back of the envelope, calculation led me to conclude this thing could smash into us with the force of 55 kiloton airbust (later I saw that NASA estimate it’s more like 8 kT) – so even if they have got the calculation wrong it is unlikely to result in the end of humanity unless some idiot launches-on-warning. (Though it would not be nice if it came down over a city as opposed to an ocean).
But what really interested me is that surely we should be describing 2009 BD not as an asteroid but as a satellite of Earth? It orbits the earth with a period of one year as the java applet on the JPL website clearly shows.