I am working on a project to see whether, using various “deep learning” methods, it is possible to take a photograph of some musical notation and play it back. (I was inspired to do this by having a copy of 1955’s Labour Party Songbook and wondering what many of the songs sounded like.)
The first task is to identify which parts of the page contain musical notation and I have been working with a training set built from pictures of music chopped into 100 x 100 pixel blocks – each is labelled as containing or not containing musical notation and the network is trained, using back propagation, to attempt to recognise these segments automatically.
Now I have tested it for the first time and the results are interesting – but a bit disappointing. In this image all that is plotted is the neural net’s output: the redder the image, the higher the output from the net’s single output neuron:
It’s a bit of a mystery to me as to why you can see the staves and notes in this sort of shadowy form: as that means the network is rejecting them as musical notation even as it does highlight the regions where they are found as the best places to look.
To make it all a bit clearer, here are the results with the blue/green pixels of the original image unchanged and the red pixels set on the strength of the network’s output:
It seems clear the network is, more or less, detecting where there is writing on the page – though with some bias towards musical staves.
I’m not too disappointed. My approach – based on stuff I read in a book almost 25 years old – was probably a bit naïve in any case. I came across a much more recent and what looks to be much more relevant text yesterday and that’s what I will be reading in the next few days.
Well, Douglas Hofstadter has now come to his own (partial) rescue as I continue to read on through Godel, Escher, Bach – as he describes Tarski’s Theorem, which essentially states that humans cannot determine all such statements either (unless we posit the Church-Turing thesis is wrong, of course and there is some inner human computational magic we have yet to describe).
I am now going to quickly run through Hofstadter’s exposition – it might not mean too much to those of you not familiar with GEB, but if so and if you are interested in computation (and genetics and music) and you want to improve your mind this summer you could always think about buying the book. I don’t promise it’s an easy read, the style can vary from the nerdy to the deeply frustrating, but it is still a rewarding read.
So here goes:
We imagine we have a procedure that can determine the truth of a number theoretical statement, i.e.:
can tell us whether the number theoretical statement with Goedel number a is true.
So now we posit, with Goedel number :
Now, of you have read GEB you will know that to “arithmoquine” a number theoretical statement is to replace the free variable – in this case , with the Goedel number for the statement itself…
Which we can state as “the arithmoquinification of is the Goedel number of a false statement”.
But the arithmoquinification of is ‘s own number, so this statement is the equivalent of saying “this statement is false”: just another version of the famous Epimenides Paradox, but one that is decidedly not hand waving in form: it’s about natural numbers.
The outcome is that cannot exist without our whole idea of natural numbers collapsing and we are forced to conclude there is no formal way of deciding what is a true statement of theoretical number theory using only theoretical number theory – and so humans are no better off than computers in this regard: we use concepts from without the formal theory to establish truth and we could surely program our computers to do the same. Turing’s “imitation game” conception of intelligent machines thus survives.