Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition

By : Rowel Atienza

Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition

By: Rowel Atienza

Overview of this book

Advanced Deep Learning with TensorFlow 2 and Keras, Second Edition is a completely updated edition of the bestselling guide to the advanced deep learning techniques available today. Revised for TensorFlow 2.x, this edition introduces you to the practical side of deep learning with new chapters on unsupervised learning using mutual information, object detection (SSD), and semantic segmentation (FCN and PSPNet), further allowing you to create your own cutting-edge AI projects. Using Keras as an open-source deep learning library, the book features hands-on projects that show you how to create more effective AI with the most up-to-date techniques. Starting with an overview of multi-layer perceptrons (MLPs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs), the book then introduces more cutting-edge techniques as you explore deep neural network architectures, including ResNet and DenseNet, and how to create autoencoders. You will then learn about GANs, and how they can unlock new levels of AI performance. Next, you’ll discover how a variational autoencoder (VAE) is implemented, and how GANs and VAEs have the generative power to synthesize data that can be extremely convincing to humans. You'll also learn to implement DRL such as Deep Q-Learning and Policy Gradient Methods, which are critical to many modern results in AI.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Introducing Advanced Deep Learning with Keras

1. Why is Keras the perfect deep learning library?

2. MLP, CNN, and RNN

3. Multilayer Perceptron (MLP)

4. Convolutional Neural Network (CNN)

5. Recurrent Neural Network (RNN)

6. Conclusion

7. References

Free Chapter

Deep Neural Networks

1. Functional API

2. Deep Residual Network (ResNet)

3. ResNet v2

4. Densely Connected Convolutional Network (DenseNet)

5. Conclusion

6. References

Autoencoders

1. Principles of autoencoders

2. Building an autoencoder using Keras

3. Denoising autoencoders (DAEs)

4. Automatic colorization autoencoder

5. Conclusion

6. References

Generative Adversarial Networks (GANs)

1. An Overview of GANs

2. Implementing DCGAN in Keras

Improved GANs

2. Least-squares GAN (LSGAN)

3. Auxiliary Classifier GAN (ACGAN)

4. Conclusion

5. References

Disentangled Representation GANs

1. Disentangled representations

2. StackedGAN

4. Conclusion

5. References

Cross-Domain GANs

1. Principles of CycleGAN

2. Conclusion

3. References

Variational Autoencoders (VAEs)

1. Principles of VAE

2. Conditional VAE (CVAE)

3. 𝛽-VAE – VAE with disentangled latent representations

4. Conclusion

5. References

Deep Reinforcement Learning

1. Principles of Reinforcement Learning (RL)

2. The Q value

3. Q-learning example

4. Nondeterministic environment

5. Temporal-difference learning

6. Deep Q-Network (DQN)

7. Conclusion

8. References

Policy Gradient Methods

1. Policy gradient theorem

2. Monte Carlo policy gradient (REINFORCE) method

3. REINFORCE with baseline method

4. Actor-Critic method

5. Advantage Actor-Critic (A2C) method

6. Policy Gradient methods using Keras

7. Performance evaluation of policy gradient methods

Object Detection

3. Ground truth anchor boxes

4. Loss functions

5. SSD model architecture

6. SSD model architecture in Keras

7. SSD objects in Keras

8. SSD model in Keras

9. Data generator model in Keras

10. Example dataset

11. SSD model training

12. Non-Maximum Suppression (NMS) algorithm

13. SSD model validation

14. Conclusion

15. References

Semantic Segmentation

1. Segmentation

2. Semantic segmentation network

3. Semantic segmentation network in Keras

4. Example dataset

5. Semantic segmentation validation

6. Conclusion

7. References

Unsupervised Learning Using Mutual Information

1. Mutual Information

2. Mutual Information and Entropy

3. Unsupervised learning by maximizing the Mutual Information of discrete random variables

4. Encoder network for unsupervised clustering

5. Unsupervised clustering implementation in Keras

6. Validation using MNIST

7. Unsupervised learning by maximizing the Mutual Information of continuous random variables

8. Estimating the Mutual Information of a bivariate Gaussian

9. Unsupervised clustering using continuous random variables in Keras

10. Conclusion

11. References

Other Books You May Enjoy

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

5. Recurrent Neural Network (RNN)

We're now going to look at the last of our three artificial neural networks, RNN.

RNNs are a family of networks that are suitable for learning representations of sequential data like text in natural language processing (NLP) or a stream of sensor data in instrumentation. While each MNIST data sample is not sequential in nature, it is not hard to imagine that every image can be interpreted as a sequence of rows or columns of pixels. Thus, a model based on RNNs can process each MNIST image as a sequence of 28-element input vectors with timesteps equal to 28. The following listing shows the code for the RNN model in Figure 1.5.1:

Figure 1.5.1: RNN model for MNIST digit classification

Listing 1.5.1: rnn-mnist-1.5.1.py

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, SimpleRNN
from tensorflow.keras.utils import to_categorical, plot_model
from tensorflow.keras.datasets import mnist

# load mnist dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# compute the number of labels
num_labels = len(np.unique(y_train))

# convert to one-hot vector
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# resize and normalize
image_size = x_train.shape[1]
x_train = np.reshape(x_train,[-1, image_size, image_size])
x_test = np.reshape(x_test,[-1, image_size, image_size])
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# network parameters
input_shape = (image_size, image_size)
batch_size = 128
units = 256
dropout = 0.2

# model is RNN with 256 units, input is 28-dim vector 28 timesteps
model = Sequential()
model.add(SimpleRNN(units=units,
                    dropout=dropout,
                    input_shape=input_shape))
model.add(Dense(num_labels))
model.add(Activation('softmax'))
model.summary()
plot_model(model, to_file='rnn-mnist.png', show_shapes=True)

# loss function for one-hot vector
# use of sgd optimizer
# accuracy is good metric for classification tasks
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
# train the network
model.fit(x_train, y_train, epochs=20, batch_size=batch_size)

_, acc = model.evaluate(x_test,
                        y_test,
                        batch_size=batch_size,
                        verbose=0)
print("\nTest accuracy: %.1f%%" % (100.0 * acc))

There are two main differences between the RNN classifier and the two previous models. First is the input_shape = (image_size, image_size), which is actually input_ shape = (timesteps, input_dim) or a sequence of input_dim-dimension vectors of timesteps length. Second is the use of a SimpleRNN layer to represent an RNN cell with units=256. The units variable represents the number of output units. If the CNN is characterized by the convolution of kernels across the input feature map, the RNN output is a function not only of the present input but also of the previous output or hidden state. Since the previous output is also a function of the previous input, the current output is also a function of the previous output and input and so on. The SimpleRNN layer in Keras is a simplified version of the true RNN. The following equation describes the output of SimpleRNN:

In this equation, b is the bias, while W and U are called recurrent kernel (weights for the previous output) and kernel (weights for the current input), respectively. Subscript t is used to indicate the position in the sequence. For a SimpleRNN layer with units=256, the total number of parameters is 256 + 256 × 256 + 256 × 28 = 72,960, corresponding to b, W, and U contributions.

The following figure shows the diagrams of both SimpleRNN and RNN when used for classification tasks. What makes SimpleRNN simpler than an RNN is the absence of the output values o_t = Vh_t + c before the softmax function is computed:

Figure 1.5.2: Diagram of SimpleRNN and RNN

RNNs might be initially harder to understand when compared to MLPs or CNNs. In an MLP, the perceptron is the fundamental unit. Once the concept of the perceptron is understood, an MLP is just a network of perceptrons. In a CNN, the kernel is a patch or window that slides through the feature map to generate another feature map. In an RNN, the most important is the concept of self-loop. There is in fact just one cell.

The illusion of multiple cells appears because a cell exists per timestep, but in fact it is just the same cell reused repeatedly unless the network is unrolled. The underlying neural networks of RNNs are shared across cells.

The summary in Listing 1.5.2 indicates that using a SimpleRNN requires a fewer number of parameters.

Listing 1.5.2: Summary of an RNN MNIST digit classifier

Layer (type)	               Output Shape	  Param #
=================================================================
simple_rnn_1 (SimpleRNN)       (None, 256)        72960
dense_1 (Dense)                (None, 10)         2570
activation_1 (Activation)      (None, 10)         36928
=================================================================
Total params: 75,530
Trainable params: 75,530
Non-trainable params: 0

Figure 1.5.3 shows the graphical description of the RNN MNIST digit classifier. The model is very concise:

A screenshot of a cell phone Description automatically generated

Figure 1.5.3: The RNN MNIST digit classifier graphical description

Table 1.5.1 shows that the SimpleRNN has the lowest accuracy among the networks presented:

Layers	Optimizer	Regularizer	Train Accuracy (%)	Test Accuracy (%)
256	SGD	Dropout(0.2)	97.26	98.00
256	RMSprop	Dropout(0.2)	96.72	97.60
256	Adam	Dropout(0.2)	96.79	97.40
512	SGD	Dropout(0.2)	97.88	98.30

Table 1.5.1: The different SimpleRNN network configurations and performance measures

In many deep neural networks, other members of the RNN family are more commonly used. For example, Long Short-Term Memory (LSTM) has been used in both machine translation and question answering problems. LSTM addresses the problem of long-term dependency or remembering relevant past information to the present output.

Unlike an RNN or a SimpleRNN, the internal structure of the LSTM cell is more complex. Figure 1.5.4 shows a diagram of LSTM. LSTM uses not only the present input and past outputs or hidden states, but it introduces a cell state, s_t, that carries information from one cell to the other. The information flow between cell states is controlled by three gates, f_t, i_t, and q_t. The three gates have the effect of determining which information should be retained or replaced and the amount of information in the past and current input that should contribute to the current cell state or output. We will not discuss the details of the internal structure of the LSTM cell in this book. However, an intuitive guide to LSTMs can be found at http://colah.github.io/posts/2015-08-Understanding-LSTMs.

The LSTM() layer can be used as a drop-in replacement for SimpleRNN(). If LSTM is overkill for the task at hand, a simpler version called a Gated Recurrent Unit (GRU) can be used. A GRU simplifies LSTM by combining the cell state and hidden state together. A GRU also reduces the number of gates by one. The GRU() function can also be used as a drop-in replacement for SimpleRNN().

Figure 1.5.4: Diagram of LSTM. The parameters are not shown for clarity.

There are many other ways to configure RNNs. One way is making an RNN model that is bidirectional. By default, RNNs are unidirectional in the sense that the current output is only influenced by the past states and the current input.

In bidirectional RNNs, future states can also influence the present and past states by allowing information to flow backward. Past outputs are updated as needed depending on the new information received. RNNs can be made bidirectional by calling a wrapper function. For example, the implementation of bidirectional LSTM is Bidirectional(LSTM()).

For all types of RNNs, increasing the number of units will also increase the capacity. However, another way of increasing the capacity is by stacking the RNN layers. It should be noted though that as a general rule of thumb, the capacity of the model should only be increased if needed. Excess capacity may contribute to overfitting, and, as a result, may lead to both a longer training time and a slower performance during prediction.

Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition

By : Rowel Atienza

Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition

By: Rowel Atienza

Overview of this book

Related Content you might be interested in

Current Title:

Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition

Hands-On Image Generation with TensorFlow

Hands-On Generative Adversarial Networks with Keras

Advanced Deep Learning with Python

5. Recurrent Neural Network (RNN)