Neural Networks Learning The Basics : Layers, Activation

This post continues from neural network basics part 1: Layers Matrix Multiplication. This post therefore assumes that you have basic knowledge on what a neuron is.

In neural network basics part 1: Layers Matrix Multiplication I covered matrix multiplication and defined a simple neural network as the weighted sum of inputs with the equation:

$Y = \sum{input*weight}$

if we include the bias term

$Y = \sum{input*weight + bias}$

In the above $Y$  can take on any value between -inf and inf. The activation function sets boundaries on the values of $Y$ . This property is very critical in the learning process as we will see in the Backpropagation post. Applying an activation function F(), transforms the equation to:

$Z = F(Y)$

This applies a non-linear transformation to the output, giving the neural network the capability to learn complex patterns. The equation without F(), the activation, is simply a linear transformation. A linear equation does not have the capability to learn more complex patterns. Therefore, a neural network without an activation function is a linear regression model.

How do we apply this transformation to our data?

Popular activation functions in machine learning literature include:

1. Step function
2. Linear function
3. Sigmoid function
4. Hyperbolic tangent function (tanh)
5. Relu

These functions can be visualized as:

These activation functions have been derived from the Ms Excel document below and we can see that they all have varying boundaries.

Let us use the Hyperbolic Tangent function (tanh) as an example. The expression for tanh function is given as:

$F(Y) = \frac {e^{Y} - e^{-Y}} {e^{Y} + e^{-Y}}$

Therefore if $Y$ was equal to 3, F(3) = (EXP(3)-EXP(-3))/(EXP(3)+EXP(-3)) = 0.995055

Let us apply the tanh transformation to the excel example.

In this post I explained how an activation function is applied to a neural network. This therefore concludes how neuron values are computed. Recall that the weight is a trainable parameter and that in my example I used random values not optimal values. So how does a neural network learn the optimal values? There needs to be some sort of learning rule applied so that the neural network can learn how to best represent $Y$.  In the next blog post I will look at the loss function and Backpropagation.
Backpropagation is how the neural network learns the values of the weights that best represent $Y$.  The weights that best represent $Y$ are those that produce the lowest error rates. The loss function calculates the error. This process forms the backbone of neural networks and I hope to give it justice.