Giving up on the convolutional network?


For almost three months now I have been trying to build and train a convolutional network that will recognise chess puzzles: but I don’t feel I am any closer to succeeding with it than I was at the start of September and so I wonder if I should just give up.

The network itself is built, and as far as I can see, works except for the fact that I just cannot get it to converge on the training set.

The (learning) code is here: https://github.com/mcmenaminadrian/ChessNet/tree/learning

The training set is here: https://github.com/mcmenaminadrian/ChessNet/tree/learning/images

There are 25 possible classes of outcome – from an empty white square to a black square hosting a black king, and the network outputs a value between -1 (no match) and 1 (perfect match).

There are 25 convolutional fibres each with seven layers, going from a 100 x 100 input layer to a final filter (feature map) of 88 x 88 which are then fully connected to 25 output neurons (there is no pooling layer): as you can see that means there are 88 x 88 x 25 x 25  + 25 (4.84 million, plus 25 for bias) connections at the final, fully connected, layer (or alternatively each output neuron has 193601 input connections).

Perhaps the issue is that the scale of the fully connected layer dwarfs the output and influence of the feature maps? I don’t know, but what I do know that, as training goes along, the output neurons generally begin in a low (i.e., close to -1) state and then edge towards a high state, but as they do they are suddenly overwhelmed and everything returns to an even lower state than before.

Envisaging this as a three dimensional surface, we creep up a steep hillside and then fall down an even deeper hole just as we appear to be getting towards a summit: the problem seems to be that training doesn’t really teaching the network to differentiate between any of the training images, it just pushes the network towards a high value. Then, suddenly images which should be reported as low are reported as high and the error values flood the network on back propagation.

To explain further: in the training set our image X will will always be relatively infrequently seen so most results should be low and are low, with small error values (deltas as they are usually called) – so small that they are generally ignored. The deltas for X are then large and they feed into the network, dragging our response towards high. Eventually we cross a threshold and all the results – for good and bad images – are reported as high and so there lots of big deltas which overwhelm the small number of correct positives. At least that is what I think is happening.

Of course what really should happen is that the network learns to discriminate between the ‘good’ and ‘bad’ images, but that just seems as far away as ever.

Any tips, beyond giving up, gratefully received.

Advertisements

Conv-nets are hard


Months ago I started work on a convolutional neural network to recognise chess puzzles. This evening after mucking about with the learning phase for weeks I thought I had scored a breakthrough – that magic moment when, during learning, a tracked value suddenly flips from the wrong result to the right one.

Brilliant, I thought – this is about to actually work, and I started tracking another value. Only to come back to the original and see that it had all gone wrong again.

False minima abound in training – which is essentially about getting the right coefficients for a large set of non-linear equations each with many thousands of parameters. Or maybe it wasn’t a false minimum at all – but the real one, but it’s just operating over an extremely small range of parameter values.

Will I ever find it again? And if I do can I find it for the other 25 classification results too?

(As an aside: I made the code parallel to speed it up, but it’s a classic example of Amdahl’s law – even on a machine with many more processors than the 26 threads I need and with no shortage of memory, the speed-up is between 3 and 4 even with the most heavy-duty calculations run in parallel.)