-
Book Overview & Buying
-
Table Of Contents
The Deep Learning Workshop
By :
To train a perceptron, we need the following components:
In the previous section, we covered most of the preceding components: the data representation of the input data and the true labels in TensorFlow. For layers, we have the linear layer and the activation functions, which we saw in the form of the net input function and the sigmoid function respectively. For the neural network representation, we made a function called perceptron(), which uses a linear layer and a sigmoid layer to perform predictions. What we did in the previous section using input data and initial weights and biases is called forward propagation. The actual neural network training involves two stages: forward propagation and backward propagation. We will explore them in detail in the next few steps. Let's look at the training process at a higher level:
This cycle continues until the loss is minimized.
Let's implement the theory we have discussed in TensorFlow. Revisit the code in Exercise 2.01, Perceptron Implementation, where the perceptron we created just did one forward pass. We got the following predictions, and we saw that our perceptron had not learned anything:
tf.Tensor( [[0.5] [0.5] [0.5] [0.5]], shape=(4, 1), dtype=float32)
In order to make our perceptron learn, we need additional components, such as a training loop, a loss function, and an optimizer. Let's see how to implement these components in TensorFlow.
In the next exercise, when we train our model, we will use a Stochastic Gradient Descent (SGD) optimizer to minimize the loss. There are a few more advanced optimizers available and provided by TensorFlow out of the box. We will look at the pros and cons of each of them in later sections. The following code will instantiate a stochastic gradient descent optimizer using TensorFlow:
learning_rate = 0.01 optimizer = tf.optimizers.SGD(learning_rate)
The perceptron function takes care of the forward propagation. For the backpropagation of the error, we have used an optimizer. Tf.optimizers.SGD creates an instance of an optimizer. SGD will update the parameters of the networks—weights and biases—on each example from the input data. We will discuss the functioning of the gradient descent optimizer in greater detail later in this chapter. We will also discuss the significance of the 0.01 parameter, which is known as the learning rate. The learning rate is the magnitude by which SGD takes a step in order to reach the global optimum of the loss function. The learning rate is another hyperparameter that needs to be tweaked in order to train a neural network.
The following code can be used to define the epochs, training loop, and loss function:
no_of_epochs = 1000 for n in range(no_of_epochs): loss = lambda:abs(tf.reduce_mean(tf.nn.\ sigmoid_cross_entropy_with_logits\ (labels=y,logits=perceptron(X)))) optimizer.minimize(loss, [W, B])
Inside the training loop, the loss is calculated using the loss function, which is defined as a lambda function.
The tf.nn.sigmoid_cross_entropy_with_logits function calculates the loss value of each observation. It takes two parameters: Labels = y and logit = perceptron(x).
perceptron(X) returns the predicted value, which is the result of the forward propagation of the input, x. This is compared with the corresponding label value stored in y. The mean value is calculated using Tf.reduce_mean, and the magnitude is taken. The sign is ignored using the abs function. Optimizer.minimize takes the loss value and adjusts the weights and bias as a part of the backward propagation of the error.
The forward propagation is executed again with the new values of weights and bias. And this forward and backward process continues for the number of iterations we define.
During the backpropagation, the weights and biases are updated only if the loss is less than the previous cycle. Otherwise, the weights and biases remain unchanged. In this way, the optimizer ensures that even though it loops through the required number of iterations, it only stores the values of w and b for which the loss is minimal.
We have set the number of epochs for the training to 1,000 iterations. There is no rule of thumb for setting the number of epochs since the number of epochs is a hyperparameter. But how do we know when training has taken place successfully?
When we can see that the values of weights and biases have changed, we can conclude the training has taken place. Let's say we used a training loop for the OR data we saw in Exercise 2.01, Perceptron Implementation, we would see weights somewhat equal to the following:
[[0.412449151] [0.412449151]]
And the bias would be something like this:
0.236065879
When the network has learned, that is, the weights and biases have been updated, we can see whether it is making accurate predictions using accuracy_score from the scikit-learn package. We can use it to measure the accuracy of the predictions as follows:
from sklearn.metrics import accuracy_score print(accuracy_score(y, ypred))
Here, accuracy_score takes two parameters—the label values (y) and the predicted values (ypred)—and measures the accuracy. Let's say the result is 1.0. This means the perceptron is 100% accurate.
In the next exercise, we will train our perceptron to perform a binary classification.
In the previous section, we learned how to train a perceptron. In this exercise, we will train our perceptron to approximate a slightly more complicated function. We will be using randomly generated external data with two classes: class 0 and class 1. Our trained perceptron should be able to classify the random numbers based on their class:
Note
The data is in a CSV file called data.csv. You can download the file from GitHub by visiting https://packt.live/2BVtxIf.
import tensorflow as tf import pandas as pd from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score import matplotlib.pyplot as plt %matplotlib inline
Apart from tensorflow, we will need pandas to read the data from the CSV file, confusion_matrix and accuracy_score to measure the accuracy of our perceptron after the training, and matplotlib to visualize the data.
data.csv file. It should be in the same path as the Jupyter Notebook file in which you are running this exercise's code. Otherwise, you will have to change the path in the code before executing it:df = pd.read_csv('data.csv')df.head()
The output will be as follows:

Figure 2.10: Contents of the DataFrame
As you can see, the data has three columns. x1 and x2 are the features, and the label column contains the labels 0 or 1 for each observation. The best way to see this kind of data is through a scatter plot.
matplotlib:plt.scatter(df[df['label'] == 0]['x1'], \ df[df['label'] == 0]['x2'], \ marker='*') plt.scatter(df[df['label'] == 1]['x1'], \ df[df['label'] == 1]['x2'], marker='<')
The output will be as follows:

Figure 2.11: Scatter plot of external data
This shows the two distinct classes of the data shown by the two different shapes. Data with the label 0 is represented by a star, while data with the label 1 is represented by a triangle.
X_input = df[['x1','x2']].values y_label = df[['label']].values
x_input contains the features, x1 and x2. The values at the end convert it into matrix format, which is what is expected as input when the tensors are created. y_label contains the labels in matrix format.
float:x = tf.Variable(X_input, dtype=tf.float32) y = tf.Variable(y_label, dtype=tf.float32)
Exercise2.02.ipynb
Number_of_features = 2 Number_of_units = 1 learning_rate = 0.01 # weights and bias weight = tf.Variable(tf.zeros([Number_of_features, \ Number_of_units])) bias = tf.Variable(tf.zeros([Number_of_units])) #optimizer optimizer = tf.optimizers.SGD(learning_rate) def perceptron(x): z = tf.add(tf.matmul(x,weight),bias) output = tf.sigmoid(z) return output
The complete code for this step can be found at https://packt.live/3gJ73bY.
Note
The # symbol in the code snippet above denotes a code comment. Comments are added into code to help explain specific bits of logic.
weight and bias to show that the perceptron has been trained:tf.print(weight, bias)
The output is as follows:
[[-0.844034135] [0.673354745]] [0.0593947917]
ypred = perceptron(x)
ypred = tf.round(ypred)
accuracy_score method, as we did in the previous exercise:acc = accuracy_score(y.numpy(), ypred.numpy()) print(acc)
The output is as follows:
1.0
The perceptron gives 100% accuracy.
scikit-learn package.cnf_matrix = confusion_matrix(y.numpy(), \ ypred.numpy()) print(cnf_matrix)
The output will be as follows:
[[12 0] [ 0 9]]
All the numbers are along the diagonal, that is, 12 values corresponding to class 0 and 9 values corresponding to class 1 are properly classified by our trained perceptron (which has achieved 100% accuracy).
Note
To access the source code for this specific section, please refer to https://packt.live/3gJ73bY.
You can also run this example online at https://packt.live/2DhelFw. You must execute the entire Notebook in order to get the desired result.
In this exercise, we trained our perceptron into a binary classifier, and it has done pretty well. In the next exercise, we will see how to create a multiclass classifier.
A classifier that can handle two classes is known as a binary classifier, like the one we saw in the preceding exercise. A classifier that can handle more than two classes is known as a multiclass classifier. We cannot build a multiclass classifier with a single neuron. Now we move from one neuron to one layer of multiple neurons, which is required for multiclass classifiers.
A single layer of multiple neurons can be trained to be a multiclass classifier. Some of the key points are detailed here. You need as many neurons as the number of classes; that is, for a 3-class classifier, you need 3 neurons; for a 10-class classifier you need 10 neurons, and so on.
As we saw in binary classification, we used sigmoid (logistic layer) to get predictions in the range of 0 to 1. In multiclass classification, we use a special type of activation function called the Softmax activation function to get probabilities across each class that sums to 1. With the sigmoid function in a multiclass setting, the probabilities do not necessarily add up to 1, so Softmax is preferred.
Before we implement the multiclass classifier, let's explore the Softmax activation function.
The Softmax function is also known as the normalized exponential function. As the word normalized suggests, the Softmax function normalizes the input into a probability distribution that sums to 1. Mathematically, it is represented as follows:
Figure 2.12: Mathematical form of the Softmax function
To understand what Softmax does, let's use TensorFlow's built-in softmax function and see the output.
So, for the following code:
values = tf.Variable([3,1,7,2,4,5], dtype=tf.float32) output = tf.nn.softmax(values) tf.print(output)
The output will be:
[0.0151037546 0.00204407098 0.824637055 0.00555636082 0.0410562605 0.111602485]
As you can see in the output, the values input is mapped to a probability distribution that sums to 1. Note that 7 (the highest value in the original input values) received the highest weight, 0.824637055. This is what the Softmax function is mainly used for: to focus on the largest values and suppress values that are below the maximum value. Also, if we sum the output, it adds up to ~ 1.
Illustrating the example in more detail, let's say we want to build a multiclass classifier with 3 classes. We will need 3 neurons connected to a Softmax activation function:
Figure 2.13: Softmax activation function used in a multiclass classification setting
As seen in Figure 2.13, x1, x2, and x3 are the input features, which go through the net input function of each of the three neurons, which have the weights and biases (Wi, j and bi) associated with it. Lastly, the output of the neuron is fed to the common Softmax activation function instead of the individual sigmoid functions. The Softmax activation function spits out the probabilities of the 3 classes: P1, P2, and P3. The sum of these three probabilities will add to 1 because of the Softmax layer.
As we saw in the previous section, Softmax highlights the maximum value and suppresses the rest of the values. Suppose a neural network is trained to classify the input into three classes, and for a given set of inputs, the output is class 2; then it would say that P2 has the highest value since it is passed through a Softmax layer. As you can see in the following figure, P2 has the highest value, which means the prediction is correct:
Figure 2.14: Probability P2 is the highest
An associated concept is one-hot encoding. As we have three different classes, class1, class2, and class3, we need to encode the class labels into a format that we can work with more easily; so, after applying one-hot encoding, we would see the following output:
Figure 2.15: One-hot encoded data for three classes
This makes the results quick and easy to interpret. In this case, the output that has the highest value is set to 1, and all others are set to 0. The one-hot encoded output of the preceding example would be like this:
Figure 2.16: One-hot encoded output probabilities
The labels of the training data also need to be one-hot encoded. And if they have a different format, they need to be converted into one-hot-encoded format before training the model. Let's do an exercise on multiclass classification with one-hot encoding.
To perform multiclass classification, we will be using the Iris dataset (https://archive.ics.uci.edu/ml/datasets/Iris), which has 3 classes of 50 instances each, where each class refers to a type of Iris. We will have a single layer of three neurons using the Softmax activation function:
Note
You can download the dataset from GitHub using this link: https://packt.live/3ekiBBf.
import tensorflow as tf import pandas as pd from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score import matplotlib.pyplot as plt %matplotlib inline from pandas import get_dummies
You must be familiar with all of these imports as they were used in the previous exercise, except for get_dummies. This function converts a given label data into the corresponding one-hot-encoded format.
iris.csv data:df = pd.read_csv('iris.csv')df.head()
The output will be as follows:

Figure 2.17: Contents of the DataFrame
plt.scatter(df[df['species'] == 0]['sepallength'],\ df[df['species'] == 0]['sepalwidth'], marker='*') plt.scatter(df[df['species'] == 1]['sepallength'],\ df[df['species'] == 1]['sepalwidth'], marker='<') plt.scatter(df[df['species'] == 2]['sepallength'], \ df[df['species'] == 2]['sepalwidth'], marker='o')
The resulting plot will be as follows. The x axis denotes the sepal length and the y axis denotes the sepal width. The shapes in the plot represent the three species of Iris, setosa (star), versicolor (triangle), and virginica (circle):

Figure 2.18: Iris data scatter plot
There are three classes, as can be seen in the visualization, denoted by different shapes.
x = df[['petallength', 'petalwidth', \ 'sepallength', 'sepalwidth']].values y = df['species'].values
values will transform the features into matrix format.
y = get_dummies(y) y = y.values
get_dummies(y) will convert the labels into one-hot-encoded format.
float32:x = tf.Variable(x, dtype=tf.float32)
perceptron layer with three neurons:Number_of_features = 4 Number_of_units = 3 # weights and bias weight = tf.Variable(tf.zeros([Number_of_features, \ Number_of_units])) bias = tf.Variable(tf.zeros([Number_of_units])) def perceptron(x): z = tf.add(tf.matmul(x, weight), bias) output = tf.nn.softmax(z) return output
The code looks very similar to the single perceptron implementation. Only the Number_of_units parameter is set to 3. Therefore, the weight matrix will be 4 x 3 and the bias matrix will be 1 x 3.
The other change is in the activation function:
Output=tf.nn.softmax(x)
We are using softmax instead of sigmoid.
optimizer. We will be using the Adam optimizer. At this point, you can think of Adam as an improved version of gradient descent that converges faster. We will cover it in detail later in the chapter:optimizer = tf.optimizers.Adam(.01)
def train(i): for n in range(i): loss=lambda: abs(tf.reduce_mean\ (tf.nn.softmax_cross_entropy_with_logits(\ labels=y, logits=perceptron(x)))) optimizer.minimize(loss, [weight, bias])
Again, the code looks very similar to the single-neuron implementation except for the loss function. Instead of sigmoid_cross_entropy_with_logits, we use softmax_cross_entropy_with_logits.
1000 iterations:train(1000)
tf.print(weight)
The output shows the learned weights of our perceptron:
[[0.684310317 0.895633 -1.0132345] [2.6424644 -1.13437736 -3.20665336] [-2.96634197 -0.129377216 3.2572844] [-2.97383809 -3.13501668 3.2313652]]
accuracy_score, like in the previous exercise:ypred=perceptron(x) ypred=tf.round(ypred) accuracy_score(y, ypred)
The output is:
0.98
It has given 98% accuracy, which is pretty good.
Note
To access the source code for this specific section, please refer to https://packt.live/2Dhes3U.
You can also run this example online at https://packt.live/3iJJKkm. You must execute the entire Notebook in order to get the desired result.
In this exercise, we performed multiclass classification using our perceptron. Let's do a more complex and interesting case study of the handwritten digit recognition dataset in the next section.
Now that we have seen how to train a single neuron and a single layer of neurons, let's take a look at more realistic data. MNIST is a famous case study. In the next exercise, we will create a 10-class classifier to classify the MNIST dataset. However, before that, you should get a good understanding of the MNIST dataset.
Modified National Institute of Standards and Technology (MNIST) refers to the modified dataset that the team led by Yann LeCun worked with at NIST. This project was aimed at handwritten digit recognition using neural networks.
We need to understand the dataset before we get into writing the code. The MNIST dataset is integrated into the TensorFlow library. It consists of 70,000 handwritten images of the digits 0 to 9:
Figure 2.19: Handwritten digits
When we say images, you might think these are JPEG files, but they are not. They are actually stored in the form of pixel values. As far as the computer is concerned, an image is a bunch of numbers. These numbers are pixel values ranging from 0 to 255. The dimension of each of these images is 28 x 28. The images are stored in the form of a 28 x 28 matrix, each cell containing real numbers ranging from 0 to 255. These are grayscale images (commonly known as black and white). 0 indicates white and 1 indicates complete black, and values in between indicate a certain shade of gray. The MNIST dataset is split into 60,000 training images and 10,000 test images.
Each image has a label associated with it ranging from 0 to 9. In the next exercise, let's build a 10-class classifier to classify the handwritten MNIST images.
In this exercise, we will build a single-layer 10-class classifier consisting of 10 neurons with the Softmax activation function. It will have an input layer of 784 pixels:
import tensorflow as tf import pandas as pd from sklearn.metrics import accuracy_score import matplotlib.pyplot as plt %matplotlib inline from pandas import get_dummies
mnist = tf.keras.datasets.mnist
train and test data:(train_features, train_labels), (test_features, test_labels) = \ mnist.load_data()
train_features, test_features = train_features / 255.0, \ test_features / 255.0
784 using the reshape function:x = tf.reshape(train_features,[60000, 784])
Variable with the features and typecast it to float32:x = tf.Variable(x) x = tf.cast(x, tf.float32)
y_hot = get_dummies(train_labels) y = y_hot.values
10 neurons and train it for 1000 iterations:Exercise2.04.ipynb
#defining the parameters Number_of_features = 784 Number_of_units = 10 # weights and bias weight = tf.Variable(tf.zeros([Number_of_features, \ Number_of_units])) bias = tf.Variable(tf.zeros([Number_of_units]))
The complete code for this step can be accessed from https://packt.live/3efd7Yh.
# Prepare the test data to measure the accuracy. test = tf.reshape(test_features, [10000, 784]) test = tf.Variable(test) test = tf.cast(test, tf.float32) test_hot = get_dummies(test_labels) test_matrix = test_hot.values
ypred = perceptron(test) ypred = tf.round(ypred)
accuracy_score(test_hot, ypred)
The predicted accuracy is:
0.9304
Note
To access the source code for this specific section, please refer to https://packt.live/3efd7Yh.
You can also run this example online at https://packt.live/2Oc83ZW. You must execute the entire Notebook in order to get the desired result.
In this exercise, we saw how to create a single-layer multi-neuron neural network and train it as a multiclass classifier.
The next step is to build a multilayer neural network. However, before we do that, we must learn about the Keras API, since we use Keras to build dense neural networks.
Change the font size
Change margin width
Change background colour