Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition

By : Amita Kapoor, Antonio Gulli, Sujit Pal

5 (2)

Buy this Book

Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition

5 (2)

By: Amita Kapoor, Antonio Gulli, Sujit Pal

Buy this Book

Overview of this book

Deep Learning with TensorFlow and Keras teaches you neural networks and deep learning techniques using TensorFlow (TF) and Keras. You'll learn how to write deep learning applications in the most powerful, popular, and scalable machine learning stack available. TensorFlow 2.x focuses on simplicity and ease of use, with updates like eager execution, intuitive higher-level APIs based on Keras, and flexible model building on any platform. This book uses the latest TF 2.0 features and libraries to present an overview of supervised and unsupervised machine learning models and provides a comprehensive analysis of deep learning and reinforcement learning models using practical examples for the cloud, mobile, and large production environments. This book also shows you how to create neural networks with TensorFlow, runs through popular algorithms (regression, convolutional neural networks (CNNs), transformers, generative adversarial networks (GANs), recurrent neural networks (RNNs), natural language processing (NLP), and graph neural networks (GNNs)), covers working example apps, and then dives into TF in production, TF mobile, and TensorFlow with AutoML.

Preface

Who this book is for

What this book covers

Get in touch

References

Neural Network Foundations with TF

What is TensorFlow (TF)?

What is Keras?

Introduction to neural networks

Perceptron

Multi-layer perceptron: our first example of a network

A real example: recognizing handwritten digits

Regularization

Playing with Google Colab: CPUs, GPUs, and TPUs

Sentiment analysis

Predicting output

A practical overview of backpropagation

What have we learned so far?

Toward a deep learning approach

Summary

References

Free Chapter

Regression and Classification

What is regression?

Prediction using linear regression

Neural networks for linear regression

Classification tasks and decision boundaries

Summary

References

Convolutional Neural Networks

Deep convolutional neural networks

An example of DCNN: LeNet

Recognizing CIFAR-10 images with deep learning

Very deep convolutional networks for large-scale image recognition

Deep Inception V3 for transfer learning

Other CNN architectures

Style transfer

Summary

References

Word Embeddings

Word embedding ‒ origins and fundamentals

Distributed representations

Static embeddings

Creating your own embeddings using Gensim

Exploring the embedding space with Gensim

Using word embeddings for spam detection

Neural embeddings – not just for words

Character and subword embeddings

Dynamic embeddings

Sentence and paragraph embeddings

Language model-based embeddings

Summary

References

Recurrent Neural Networks

Encoder-decoder architecture – seq2seq

Attention mechanism

Summary

References

Transformers

Architecture

Transformers’ architectures

Pretraining

An overview of popular and well-known models

Implementation

Evaluation

Optimization

Common pitfalls: dos and don’ts

The future of transformers

Summary

Unsupervised Learning

Principal component analysis

K-means clustering

Self-organizing maps

Restricted Boltzmann machines

Summary

References

Autoencoders

Introduction to autoencoders

Vanilla autoencoders

Sparse autoencoder

Denoising autoencoders

Stacked autoencoder

Variational autoencoders

Summary

References

Generative Models

What is a GAN?

Deep convolutional GAN (DCGAN)

Some interesting GAN architectures

Cool applications of GANs

CycleGAN in TensorFlow

Flow-based models for data generation

Diffusion models for data generation

Summary

References

Self-Supervised Learning

Previous work

Self-supervised learning

Summary

Reinforcement Learning

An introduction to RL

Simulation environments for RL

An introduction to OpenAI Gym

Deep Q-networks

Deep deterministic policy gradient

Summary

References

Probabilistic TensorFlow

TensorFlow Probability

TensorFlow Probability distributions

Summary

References

An Introduction to AutoML

What is AutoML?

Achieving AutoML

Automatic data preparation

Automatic feature engineering

Automatic model generation

AutoKeras

Google Cloud AutoML and Vertex AI

Summary

References

The Math Behind Deep Learning

History

Some mathematical tools

Activation functions

Backpropagation

A note on TensorFlow and automatic differentiation

Summary

References

Tensor Processing Unit

C/G/T processing units

Four generations of TPUs, plus Edge TPU

TPU performance

How to use TPUs with Colab

Using pretrained TPU models

Summary

References

Other Useful Deep Learning Libraries

Hugging Face

OpenAI

PyTorch

ONNX

H2O.ai

Summary

Graph Neural Networks

Graph basics

Graph machine learning

Graph convolutions – the intuition behind GNNs

Common graph layers

Common graph applications

Graph customizations

Future directions

Summary

References

Machine Learning Best Practices

The need for best practices

Data best practices

Model best practices

Summary

References

TensorFlow 2 Ecosystem

TensorFlow Hub

TensorFlow Datasets

TensorFlow Lite

Pretrained models in TensorFlow Lite

An overview of federated learning at the edge

TensorFlow.js

Summary

References

Advanced Convolutional Neural Networks

Composing CNNs for complex tasks

Application zoos with tf.Keras and TensorFlow Hub

Answering questions about images (visual Q&A)

Creating a DeepDream network

Inspecting what a network has learned

Video

Text documents

Audio and music

A summary of convolution operations

Capsule networks

Summary

References

Other Books You May Enjoy

Index

Customer Reviews

5 (2)

5 star

100%

4 star

3 star

2 star

1 star

Classification tasks and decision boundaries

Till now, the focus of the chapter was on regression. In this section, we will talk about another important task: the task of classification. Let us first understand the difference between regression (also sometimes referred to as prediction) and classification:

In classification, the data is grouped into classes/categories, while in regression, the aim is to get a continuous numerical value for given data. For example, identifying the number of handwritten digits is a classification task; all handwritten digits will belong to one of the ten numbers lying between 0-9. The task of predicting the price of the house depending upon different input variables is a regression task.
In a classification task, the model finds the decision boundaries separating one class from another. In the regression task, the model approximates a function that fits the input-output relationship.
Classification is a subset of regression; here, we are predicting classes. Regression is much more general.

Figure 2.8 shows how classification and regression tasks differ. In classification, we need to find a line (or a plane or hyperplane in multidimensional space) separating the classes. In regression, the aim is to find a line (or plane or hyperplane) that fits the given input points:

Figure 2.8: Classification vs regression

In the following section, we will explain logistic regression, which is a very common and useful classification technique.

Logistic regression

Logistic regression is used to determine the probability of an event. Conventionally, the event is represented as a categorical dependent variable. The probability of the event is expressed using the sigmoid (or “logit”) function:

The goal now is to estimate weights and bias term b. In logistic regression, the coefficients are estimated using either the maximum likelihood estimator or stochastic gradient descent. If p is the total number of input data points, the loss is conventionally defined as a cross-entropy term given by:

Logistic regression is used in classification problems. For example, when looking at medical data, we can use logistic regression to classify whether a person has cancer or not. If the output categorical variable has two or more levels, we can use multinomial logistic regression. Another common technique used for two or more output variables is one versus all.

For multiclass logistic regression, the cross-entropy loss function is modified as:

where K is the total number of classes. You can read more about logistic regression at https://en.wikipedia.org/wiki/Logistic_regression.

Now that you have some idea about logistic regression, let us see how we can apply it to any dataset.

Logistic regression on the MNIST dataset

Next, we will use TensorFlow Keras to classify handwritten digits using logistic regression. We will be using the MNIST (Modified National Institute of Standards and Technology) dataset. For those working in the field of deep learning, MNIST is not new, it is like the ABC of machine learning. It contains images of handwritten digits and a label for each image, indicating which digit it is. The label contains a value lying between 0-9 depending on the handwritten digit. Thus, it is a multiclass classification.

To implement the logistic regression, we will make a model with only one dense layer. Each class will be represented by a unit in the output, so since we have 10 classes, the number of units in the output would be 10. The probability function used in the logistic regression is similar to the sigmoid activation function; therefore, we use sigmoid activation.

Let us build our model:

The first step is, as always, importing the modules needed. Notice that here we are using another useful layer from the Keras API, the Flatten layer. The Flatten layer helps us to resize the 28 x 28 two-dimensional input images of the MNIST dataset into a 784 flattened array:
```
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow.keras as K
from tensorflow.keras.layers import Dense, Flatten
```

We take the input data of MNIST from the tensorflow.keras dataset:

((train_data, train_labels),(test_data, test_labels)) = tf.keras.datasets.mnist.load_data()

Next, we preprocess the data. We normalize the images; the MNIST dataset images are black and white images with the intensity value of each pixel lying between 0-255. We divide it by 255, so that now the values lie between 0-1:
```
train_data = train_data/np.float32(255)
train_labels = train_labels.astype(np.int32)  
test_data = test_data/np.float32(255)
test_labels = test_labels.astype(np.int32)
```

Now, we define a very simple model; it has only one Dense layer with 10 units, and it takes an input of size 784. You can see from the output of the model summary that only the Dense layer has trainable parameters:

model = K.Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(10, activation='sigmoid')
])
model.summary()

Model: "sequential"
____________________________________________________________
 Layer (type)           Output Shape              Param #   
============================================================
 flatten (Flatten)      (None, 784)               0         
                                                            
 dense (Dense)          (None, 10)                7850      
                                                            
============================================================
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
____________________________________________________________

Since the test labels are integral values, we will use SparseCategoricalCrossentropy loss with logits set to True. The optimizer selected is Adam. Additionally, we also define accuracy as metrics to be logged as the model is trained. We train our model for 50 epochs, with a train-validation split of 80:20:
```
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
history = model.fit(x=train_data,y=train_labels, epochs=50, verbose=1, validation_split=0.2)
```
Let us see how our simple model has fared by plotting the loss plot. You can see that since the validation loss and training loss are diverging, as the training loss is decreasing, the validation loss increases, thus the model is overfitting. You can improve the model performance by adding hidden layers:
```
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
```

Chart, line chart Description automatically generated

Figure 2.9: Loss plot

To better understand the result, we build two utility functions; these functions help us in visualizing the handwritten digits and the probability of the 10 units in the output:

def plot_image(i, predictions_array, true_label, img):
    true_label, img = true_label[i], img[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    plt.imshow(img, cmap=plt.cm.binary)
    predicted_label = np.argmax(predictions_array)
    if predicted_label == true_label:
      color ='blue'
    else:
      color ='red'
    plt.xlabel("Pred {} Conf: {:2.0f}% True ({})".format(predicted_label,
                                  100*np.max(predictions_array),
                                  true_label),
                                  color=color)
def plot_value_array(i, predictions_array, true_label):
    true_label = true_label[i]
    plt.grid(False)
    plt.xticks(range(10))
    plt.yticks([])
    thisplot = plt.bar(range(10), predictions_array,
    color"#777777")
    plt.ylim([0, 1])
    predicted_label = np.argmax(predictions_array)
    thisplot[predicted_label].set_color('red')
    thisplot[true_label].set_color('blue')

Using these utility functions, we plot the predictions:

predictions = model.predict(test_data)
i = 56
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_data)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  test_labels)
plt.show()

The plot on the left is the image of the handwritten digit, with the predicted label, the confidence in the prediction, and the true label. The image on the right shows the probability (logistic) output of the 10 units; we can see that the unit which represents the number 4 has the highest probability:

Figure 2.10: Predicted digit and confidence value of the prediction

In this code, to stay true to logistic regression, we used a sigmoid activation function and only one Dense layer. For better performance, adding dense layers and using softmax as the final activation function will be helpful. For example, the following model gives 97% accuracy on the validation dataset:
```
better_model = K.Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128,  activation='relu'),
    Dense(10, activation='softmax')
])
better_model.summary()
```

You can experiment by adding more layers, or by changing the number of neurons in each layer, and even changing the optimizer. This will give you a better understanding of how these parameters influence the model performance.

Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition

By : Amita Kapoor, Antonio Gulli, Sujit Pal

Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition

By: Amita Kapoor, Antonio Gulli, Sujit Pal

Overview of this book

Related Content you might be interested in

Current Title:

Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition

Deep Learning with Keras

TensorFlow 1.x Deep Learning Cookbook

Advanced Natural Language Processing with TensorFlow 2

Classification tasks and decision boundaries

Logistic regression

Logistic regression on the MNIST dataset