After two weeks I was finally able to train a two-layer neural network to learn the XOR Problem. Here’s what I learned:
- Learn the backpropagation algorithm thoroughly. Most of the resources I initially found focused on proving it works and not on how to code the algorithm. To implement the backpropagation algorithm, all you need is a basic understanding (which I’ll explain in a later post). To complicate matters, you also need to watch out for misinformation—plenty of descriptions of the algorithm update the weights incorrectly (for example, this pdf).
- Learn the backpropagation algorithm for multiple examples. (batch mode) Again, most of the resources I initially found focused on training networks one example at a time (sequential mode). Training a network one example at a time yields different results than training it with multiple examples at once.
- Unit test, unit test, unit test. It’s very easy to make a fat finger mistake that breaks the implementation in subtle ways. I ended up writing acceptance tests for everything I coded and I was surprised at the fat-finger errors I made (for example, I had made a typo in the activation function, which I could have sworn was coded correctly). When it came time to finally test the backpropagation algorithm, I compared the results with manual calculations for simple networks.
- Develop an intuitive understanding of the learning and momentum parameters. Contrary to what I originally thought, networks can not learn using arbitrary learning and momentum parameters. You might think that the implementation is broken but you’re just using the wrong parameters. So start with a simple two-layer network and train it to learn one example. Explore how the parameters affect the rate of learning or whether the network learns at all. Repeat with two examples. In general, you’ll find that difficult problems require a smaller learning parameter (but that increases the number of iterations of the backpropagation algorithm required). The momentum keeps the network from getting stuck but if its too large then it also keeps the network from learning.
A later post will provide links to resources and explain my implementation of the backpropagation algorithm.
Yes, code is available. Take a look at https://github.com/frankandrobot/mustached-octo-neuralnetwork.