The activation functions live in the neural network (nn) library in TensorFlow. Besides using built-in activation functions, we can also design our own using TensorFlow operations. We can import the predefined activation functions (import tensorflow.nn as nn) or be explicit and write nn in our function calls. Here, we choose to be explicit with each function call:
- The rectified linear unit, known as ReLU, is the most common and basic way to introduce non-linearity into neural networks. This function is just called max(0,x). It is continuous, but not smooth. It appears as follows:
print(sess.run(tf.nn.relu([-3., 3., 10.])))
[ 0. 3. 10.]
- There are times where we will want to cap the linearly increasing part of the preceding ReLU activation function. We can do this by nesting the max(0,x) function into a min() function. The implementation that TensorFlow has is called the ReLU6 function. This is defined as min(max(0,x),6). This is a version of the hard-sigmoid function and is computationally faster, and does not suffer from vanishing (infinitesimally near zero) or exploding values. This will come in handy when we discuss deeper neural networks in Chapter 8, Convolutional Neural Networks, and Chapter 9, Recurrent Neural Networks. It appears as follows:
print(sess.run(tf.nn.relu6([-3., 3., 10.])))
[ 0. 3. 6.]
- The sigmoid function is the most common continuous and smooth activation function. It is also called a logistic function and has the form
. The sigmoid function is not used very often because of its tendency to zero-out the backpropagation terms during training. It appears as follows:
print(sess.run(tf.nn.sigmoid([-1., 0., 1.])))
[ 0.26894143 0.5 0.7310586 ]
We should be aware that some activation functions are not zero-centered, such as the sigmoid. This will require us to zero-mean data prior to using it in most computational graph algorithms.
- Another smooth activation function is the hyper tangent. The hyper tangent function is very similar to the sigmoid except that instead of having a range between 0 and 1, it has a range between -1 and 1. This function has the form of the ratio of the hyperbolic sine over the hyperbolic cosine. Another way to write this is
. This activation function is as follows:
print(sess.run(tf.nn.tanh([-1., 0., 1.])))
[-0.76159418 0. 0.76159418 ]
- The softsign function also gets used as an activation function. The form of this function is
. The softsign function is supposed to be a continuous (but not smooth) approximation to the sign function. See the following code:
print(sess.run(tf.nn.softsign([-1., 0., -1.])))
[-0.5 0. 0.5]
- Another function, the softplus function, is a smooth version of the ReLU function. The form of this function is
. It appears as follows:
print(sess.run(tf.nn.softplus([-1., 0., -1.])))
[ 0.31326166 0.69314718 1.31326163]
The softplus function goes to infinity as the input increases, whereas the softsign function goes to 1. As the input gets smaller, however, the softplus function approaches zero and the softsign function goes to -1.
- The Exponential Linear Unit (ELU) is very similar to the softplus function except that the bottom asymptote is -1 instead of 0. The form is
if x < 0 else x. It appears as follows:
print(sess.run(tf.nn.elu([-1., 0., -1.])))
[-0.63212055 0. 1. ]