Conv-nets are hard

Months ago I started work on a convolutional neural network to recognise chess puzzles. This evening after mucking about with the learning phase for weeks I thought I had scored a breakthrough – that magic moment when, during learning, a tracked value suddenly flips from the wrong result to the right one.

Brilliant, I thought – this is about to actually work, and I started tracking another value. Only to come back to the original and see that it had all gone wrong again.

False minima abound in training – which is essentially about getting the right coefficients for a large set of non-linear equations each with many thousands of parameters. Or maybe it wasn’t a false minimum at all – but the real one, but it’s just operating over an extremely small range of parameter values.

Will I ever find it again? And if I do can I find it for the other 25 classification results too?

(As an aside: I made the code parallel to speed it up, but it’s a classic example of Amdahl’s law – even on a machine with many more processors than the 26 threads I need and with no shortage of memory, the speed-up is between 3 and 4 even with the most heavy-duty calculations run in parallel.)