Convolutional network (again)

Black king on black square with random filters

With time on my hands I have returned to working on an old project – attempting to build a convolutional network that will solve chess puzzles.

(A convolutional network is a type of neural network – a modelled ‘artificial intelligence’ that can be used to detect patterns or undertake similar tasks.)

Here I am not using ‘AI’ to solve the chess puzzle itself (though there is are very large libraries of chess endings and positions available, so I suppose that would be possible), but to read the chess position in the puzzle.

Thus the task is to classify squares on the board.

I tried this a couple of years ago and got nowhere, but reading this book – “Machine Learning: An Applied Mathematics Introduction” has persuaded me to have another go, reducing the dimensions of the answer I am seeking from 25 to 9 (without any loss of information).

At the moment I am just in the process of building the “feed forward” network – i.e. the neural network that, once trained, will take an image as input and then give a nine-dimensional answer.

These answers can be thought of, perhaps not too accurately but not totally unreasonably, as a measure of likelihood that the input picture falls into a given category (e.g. by giving a number between 0 and 1 under the category of white square, or pawn, or black piece etc.).

The input picture is passed through a series of filters that are designed to extract features of the image and then, at the end, the AI considers all the results and gives its view as to the classification of the image.

In my AI there are 50 fibres (i.e. 50 chains of filters) and the image at the top of the page shows the results of passing the image – a black king on a black square – through the top two layers. So the first 50 images are from the top rank of filters and the bottom from the second rank. I plan to implement another three layers of filters (though of smaller dimensions – the idea being they can concentrate their information) before the final “fully connected” layer (where all 50 fibres exchange information) that delivers the result.

The images here are produced from randomly assigned filters so essentially contain no real “intelligence” at all – but if you magnify the image you’ll see that even these random filters produce interesting results.

Training the network is vital of course – and that’s where it all failed last time. I’m back to reading Springer’s “Guide to Convolutional Neural Networks” – which is one of their better books but still full of shoddy editing (though I’d recommend persisting with it.)

The training is through ‘back propagation’ – essentially adjusting the network to minimise errors by testing it against a set of known results. Getting a large set of pictures of do the training against is maybe even more difficult than getting the maths of the training right. Even if I recycle the images from last time I will need a lot more.

Conv-nets are hard

Months ago I started work on a convolutional neural network to recognise chess puzzles. This evening after mucking about with the learning phase for weeks I thought I had scored a breakthrough – that magic moment when, during learning, a tracked value suddenly flips from the wrong result to the right one.

Brilliant, I thought – this is about to actually work, and I started tracking another value. Only to come back to the original and see that it had all gone wrong again.

False minima abound in training – which is essentially about getting the right coefficients for a large set of non-linear equations each with many thousands of parameters. Or maybe it wasn’t a false minimum at all – but the real one, but it’s just operating over an extremely small range of parameter values.

Will I ever find it again? And if I do can I find it for the other 25 classification results too?

(As an aside: I made the code parallel to speed it up, but it’s a classic example of Amdahl’s law – even on a machine with many more processors than the 26 threads I need and with no shortage of memory, the speed-up is between 3 and 4 even with the most heavy-duty calculations run in parallel.)

%d bloggers like this: