The activation function determines the mapping between inputs and a hidden layer. It defines the functional form for how a neuron gets activated. For example, a linear activation function could be defined as: f(x) = x, in which case the value for the neuron would be the raw input, x, times the learned weight, a linear model. A linear activation function is shown in the top panel of Figure 5.2. The problem with making activation functions linear is that this does not permit any non-linear functional forms to be learned. Previously, we have used the hyperbolic tangent as an activation function, so f(x) = tanh(x). The hyperbolic tangent can work well in some cases, but a potential limitation is that, at either low or high values, it saturates, as shown in the middle panel of Figure 5.2.
Perhaps the most popular activation function currently, and a good first choice (Nair, V., and Hinton, G. E. (2010)), is known as a rectifier...