Gradient descent training of sigmoidal feed-forward neural networks on binary mappings often gets stuck with some outputs totally wrong. This is because a sum-squared-error cost function leads to weight updates that depend on the derivative of the output sigmoid which goes to zero as the output approaches maximal error. Although it is easy to understand the cause, the best remedy is not so obvious. Common solutions involve modifying the training data, deviating from true gradient descent, or changing the cost function. In general, finding the best learning procedures for particular classes of problem is difficult because each usually depends on a number of interacting parameters that need to be set to optimal values for a fair comparison. In this paper I shall use simulated evolution to optimise all the relevant parameters, and come to a clear conclusion concerning the most efficient approach for learning binary mappings.