Because I essentially got it wrong in the last post … turns out that a fully connected network is, generally, not a great idea for image processing and that partial connections – through “convolution layers” are likely to be more efficient.
And my practical experience backs this up: my first NN did, in effect, have two convolution layers (or filters), although somewhat eccentrically designed as 100 x 1 and 1 x 100 filters. And this network performs better than the single hidden layer fully connected alternative. That may just be because it takes an age to train the fully connected network and converge of the error levels towards a low number is just taking for ever (a convolution layers has many fewer connections and so can be trained much faster).