Neural Networks Learning The Basics : Layers, Activation

This post continues from neural network basics part 1: Layers Matrix Multiplication. This post therefore assumes that you have basic knowledge on what a neuron is.

Why Add an Activation Function?

In neural network basics part 1: Layers Matrix Multiplication I covered matrix multiplication and defined a simple neural network as the weighted sum of inputs with the equation:

Y = \sum{input*weight}

if we include the bias term

Y = \sum{input*weight + bias}

In the above Y  can take on any value between -inf and inf. The activation function sets boundaries on the values of Y . This property is very critical in the learning process as we will see in the Backpropagation post. Applying an activation function F(), transforms the equation to:

Z = F(Y) 

This applies a non-linear transformation to the output, giving the neural network the capability to learn complex patterns. The equation without F(), the activation, is simply a linear transformation. A linear equation does not have the capability to learn more complex patterns. Therefore, a neural network without an activation function is a linear regression model.

How do we apply this transformation to our data?

Popular activation functions in machine learning literature include:

  1. Step function
  2. Linear function
  3. Sigmoid function
  4. Hyperbolic tangent function (tanh)
  5. Relu

These functions can be visualized as:

These activation functions have been derived from the Ms Excel document below and we can see that they all have varying boundaries.

Let us use the Hyperbolic Tangent function (tanh) as an example. The expression for tanh function is given as:

F(Y) = \frac {e^{Y} - e^{-Y}} {e^{Y} + e^{-Y}} 

Therefore if Y was equal to 3, F(3) = (EXP(3)-EXP(-3))/(EXP(3)+EXP(-3)) = 0.995055

 Let us apply the tanh transformation to the excel example.

Neural network with activation tanh

The above spreadsheet can be downloaded here:

Here we can see that the tanh function restricts the output values within boundaries -1 and 1. Therefore, if the activation function in the output layer is tanh, like with an LSTM neural network, scaling of the data is recommended if it is not already within these boundaries. The purpose of this post was to define the activation function and how they are applied to the neural network. Determining what activation function to use will be covered in the architecture choice post.


In this post I explained how an activation function is applied to a neural network. This therefore concludes how neuron values are computed. Recall that the weight is a trainable parameter and that in my example I used random values not optimal values. So how does a neural network learn the optimal values? There needs to be some sort of learning rule applied so that the neural network can learn how to best represent Y.  In the next blog post I will look at the loss function and Backpropagation.

Backpropagation is how the neural network learns the values of the weights that best represent Y.  The weights that best represent Y are those that produce the lowest error rates. The loss function calculates the error. This process forms the backbone of neural networks and I hope to give it justice.

Liked it? Take a second to support Samantha Van Der Merwe on Patreon!

Leave a Reply