Understanding backpropagation
When a feedforward neural network is used to accept an input x and produce an output yˆ, information flows forward through the network elements. The input x provides the information that then propagates up to the hidden units at each layer and produces yˆ. This is called forward propagation. During training, forward propagation continues onward until it produces a scalar cost J(θ). The backpropagation algorithm, often called backprop, allows the information from the cost to then flow backward through the network in order to compute the gradient.
Computing an analytical expression for the gradient is straightforward, but numerically evaluating such an expression can be computationally expensive. The backpropagation algorithm does so using a simple and inexpensive procedure.
Note
Backpropagation refers only to the method to compute the gradient, while another algorithm, such as stochastic gradient descent, refers to the actual mechanism.