This is the first blog post in my neural network series and primarily serves as an intro. I aim to use this neural network series as a means of explaining fundamental concepts (as well as I know them) and how they have helped me build better models. This will be not only from an academic perspective but an industry perspective as well.
I am currently concluding my thesis with the University of Cape Town. My topic is “Predicting Social Unrest Events Using LSTM Neural Networks and ARIMA: A South Africa Case Study”.
To get started, the first thing I did was jump straight to the model building process. In my defense , I had already built a couple of LSTM neural networks and had some understanding on how they work etc. I ran both models ARIMA and LSTM with success (thought so at the time). I was able to prove that LSTMs were better at learning non-linear dependencies in the data and other benefits they have over ARIMA. Happiness. Then I submitted the report to my supervisor. He came back with all sorts of questions:
- Why did you scale the data?
- Why did you choose this learning rate?
- Why do different activation functions require different scaling procedures?
- Why did you choose this weight initialisation procedure?
- Why this architecture?
- How did you determine the learning rate and why does it matter? Etc
My answer was simple: “because sklearn’s grid search said it was the optimal result as this configuration produced the lowest learning rate with the least amount of variation between experiments”. This was not the answer my supervisor was looking for. However, in industry my answer would have been acceptable as one just needs to add value and move on…
My supervisor expected me to provide all the explanations in mathematical terms along with a conceptual understanding. This meant going back to my notes and reviewing neural networks in depth. I then spent the next couple of months looking at neural networks in detail. I started with mapping out the computations within the three base concepts:
- Gradient decent methods
To achieve this I used varied source material. Of which there is an abundance of. I however found that most of the material gets complex very quickly especially for someone who last did calculus in their undergraduate years. In addition, most explanations lack the visual and interactive aspects that make learning interesting. This is one of my motivations for creating a neural network series blog. I hope to simplify these concepts for those interested in understanding the underlying mathematical reasoning behind neural networks but do not want to dive in toooo deep. I combine knowledge from my Masters with some online source material.
Content on neural network I recommend include:
My next series of posts will cover the fundamentals of neural networks implemented on a spreadsheet that can be interacted with. I am still learning more every day and looking forward to seeing how this turns out and how far the spreadsheet can expand.