Feedforward Propagation
Contents
Feedforward Propagation#
What is Feedforward Propagation?#
It is a first step in the training of a neural network (after initialization of the weights, which will be covered in the next lecture). The forward direction means going from input to output nodes.
Definition 56
The Feedforward Propagation, also called Forward Pass, is the process consisting of computing and storing all network nodes’ output values, starting with the first hidden layer until the last output layer, using at start either a subset or the entire dataset samples.
Forward propagation thus leads to a list of the neural network predictions for each data instance row used as input. At each node, the computation is the key equation (48) we saw in the previous Section Model Representation, written again for convenience:
But there will be some change in the notations. Let’s define everything in the next subsection.
Notations#
Let’s say we have the following network with

Fig. 53 . A feedforward neural network with the notation we will use for the forward propagation equations (more in text).
Image from the author#
There are lots of subscripts and upperscripts here. Let’s explain the conventions we will use.
Input data
We saw in Lecture 2 that the dataset in supervised learning can be represented as a matrix
Activation units
In a given layer
where the upperscript is the layer number and the subscript is the row of the activation unit in the layer, starting from the top.
Biases
The biases are also row vectors, one for each layer it connects to and of dimension the number of nodes in that layer:
If the last layer is only made of one node like in our example above, then
Weights
Now the weights. You may see in the literature different ways to represent them. In here we use a convention we could write as:
In other words, the first index is the row of the node from the previous layer (departing node of the weight’s arrow) and the second index is the row of the node from the current layer (the one the weight’s arrow points to). For instance
We can actually represent each weight from layer
Let’s now see how we calculate all the values of the activation units!
Step by step calculations#
Computation of the third hidder layer#
With one output node, it is actually simpler than for the hidden layers above. We can still write it in the same form as Equation (61):
with
In our case
The matrix
The bias ‘vector’ is actually a scalar:
That’s the end of the forward propagation process! As you can see, it contains lots of calculations. And now you may understand why activation functions that are simple and fast to compute are preferrable, as they intervene each time we compute the output of an activation unit.
Let’s now generalize this with a general formula.
General rule for Forward Propagation#
If we rewrite the first layer of inputs as:
then we can write a general rule for computing the outputs of a fully connected layer
This is the general rule for computing all outputs of a fully connected feedforward neural network.
Summary#
Feedforward propagation is the computation of the values of all activation units of a fully connected feedforward neural network.
As the process includes the last layer (output), feedforward propagation also leads to predictions.
These predictions will be compared to the observed values.
Feedforward propagation is a step in the training of a neural network.
The next step of the training is to go ‘backward’, from the output error
Learn More
Very nice animations here illustrating the forward propagation process.
Source: Xinyu You’s course An online deep learning course for humanists