Book Image

Intelligent Projects Using Python

By : Santanu Pattanayak
Book Image

Intelligent Projects Using Python

By: Santanu Pattanayak

Overview of this book

This book will be a perfect companion if you want to build insightful projects from leading AI domains using Python. The book covers detailed implementation of projects from all the core disciplines of AI. We start by covering the basics of how to create smart systems using machine learning and deep learning techniques. You will assimilate various neural network architectures such as CNN, RNN, LSTM, to solve critical new world challenges. You will learn to train a model to detect diabetic retinopathy conditions in the human eye and create an intelligent system for performing a video-to-text translation. You will use the transfer learning technique in the healthcare domain and implement style transfer using GANs. Later you will learn to build AI-based recommendation systems, a mobile app for sentiment analysis and a powerful chatbot for carrying customer services. You will implement AI techniques in the cybersecurity domain to generate Captchas. Later you will train and build autonomous vehicles to self-drive using reinforcement learning. You will be using libraries from the Python ecosystem such as TensorFlow, Keras and more to bring the core aspects of machine learning, deep learning, and AI. By the end of this book, you will be skilled to build your own smart models for tackling any kind of AI problems without any hassle.
Table of Contents (12 chapters)

The backpropagation method of training neural networks

In the backpropagation method, neural networks are trained through the gradient descent technique, where the combined weights vector, W, is updated iteratively, as follows:

Here, η is the learning rate, W(t+1) and W(t) are the weight vectors at iterations (t+1) and (t), respectively, and ∇C(W(t)) is the gradient of the cost function or the error function, with respect to the weight vector, W, at iteration (t). The previous algorithm for an individual weight or bias generalized by w ∈ W can be represented as follows:

As you can gather from the previous expressions, the heart of the gradient descent method of learning relies on computing the gradient of the cost function or the error function, with respect to each weight.

From the chain rule of differentiation, we know that if we have y = f(x), z = f(y), then the following is true:

This expression can be generalized to any number of variables. Now, let's take a look at a very simple neural network, as illustrated in the following diagram, in order to understand the backpropagation algorithm:

Figure 1.10: A network illustrating backpropagation

Let the input to the network be a two-dimensional vector, x = [x1 x2]T, and the corresponding output label and prediction be and , respectively. Also, let's assume that all of the activation units in the neural network are sigmoids. Let the generalized weight connecting any unit i in layer (l-1) to unit j in layer l be denoted by , while the bias in any unit i in layer l should be denoted by . Let's derive the gradient for one data point; the total gradient can be computed as the sum of all of the data points used in training (or in a mini-batch). If the output is continuous, then the loss function, C, can be chosen as the square of the error in prediction:

The weights and biases of the network, cumulatively represented by the set W, can be determined by minimizing the cost function with respect to the W vector, which is as follows:

To perform the minimization of the cost function iteratively through gradient descent, we need to compute the gradient of the cost function with respect to each weight, w ∈ W, as follows:

Now that we have everything that we need, let's compute the gradient of the cost function, C, with respect to the weight, . Using the chain rule of differentiation, we get the following:

Now let's look at the following formula:

As you can see in the previous expression, the derivative is nothing but the error in prediction. Generally, the output unit activation function is linear in the case of regression problems, and hence the following expression applies:

So, if we were to compute the gradient of the cost function with respect to the total input at the output unit, it would be . This is still equal to the error in prediction of the output.

The total input at the output unit, as a function of the incoming weights and activations, can be expressed as follows:

This means that, and the derivative of the cost function with respect to the weight, , contributing to the input of the output layer is given via the following:

As you can see, the error is backpropagated in computing the gradient of the cost function, with respect to the weights in the layers preceding the final output layer. This becomes more obvious when we compute the gradient of the cost function with respect to the generalized weight, . Let's take the weight corresponding to j=1 and k=2; that is, . The gradient of the cost function, C, with respect to this weight can be expressed as follows:

Now, , which means that, .

So, once we have figured out the gradient of the cost function with respect to the total input to a neuron as , the gradient of any weight, w, contributing to the total input, s, can be obtained by simply multiplying the activation, z, associated with the weight.

Now, the gradient of the cost function with respect to the total input, , can be derived by chain rule again, as follows:

Since all of the units of the neural network (except for the output unit) are sigmoid activation functions, the following is the case:

Combining (1), (2), and (3), we get the following:

In the preceding derived gradient expressions, you can see that the error in prediction, , is backpropagated by combining it with the relevant activations and weights (as per the chain rule of differentiation) for computing the gradients of the weights at each layer, hence, the name backpropagation in AI nomenclature.