Deep Learning with TensorFlow 2 and Keras - Second Edition

By : Antonio Gulli, Amita Kapoor, Sujit Pal

Deep Learning with TensorFlow 2 and Keras - Second Edition

By: Antonio Gulli, Amita Kapoor, Sujit Pal

Overview of this book

Deep Learning with TensorFlow 2 and Keras, Second Edition teaches neural networks and deep learning techniques alongside TensorFlow (TF) and Keras. You’ll learn how to write deep learning applications in the most powerful, popular, and scalable machine learning stack available. TensorFlow is the machine learning library of choice for professional applications, while Keras offers a simple and powerful Python API for accessing TensorFlow. TensorFlow 2 provides full Keras integration, making advanced machine learning easier and more convenient than ever before. This book also introduces neural networks with TensorFlow, runs through the main applications (regression, ConvNets (CNNs), GANs, RNNs, NLP), covers two working example apps, and then dives into TF in production, TF mobile, and using TensorFlow with AutoML.

Preface

Mission

Machine learning, artificial intelligence, and the deep learning Cambrian explosion

Who this book is for

What this book covers

What you need for this book

Get in touch

References

Free Chapter

Neural Network Foundations with TensorFlow 2.0

What is TensorFlow (TF)?

What is Keras?

What are the most important changes in TensorFlow 2.0?

Introduction to neural networks

Perceptron

Multi-layer perceptron – our first example of a network

A real example – recognizing handwritten digits

Regularization

Playing with Google Colab – CPUs, GPUs, and TPUs

Sentiment analysis

Hyperparameter tuning and AutoML

Predicting output

A practical overview of backpropagation

What have we learned so far?

Towards a deep learning approach

References

TensorFlow 1.x and 2.x

Understanding TensorFlow 1.x

Understanding TensorFlow 2.x

The TensorFlow 2.x ecosystem

Keras or tf.keras?

Summary

Regression

What is regression?

Prediction using linear regression

TensorFlow Estimators

Predicting house price using linear regression

Classification tasks and decision boundaries

Summary

References

Convolutional Neural Networks

Deep Convolutional Neural Network (DCNN)

An example of DCNN ‒ LeNet

Recognizing CIFAR-10 images with deep learning

Very deep convolutional networks for large-scale image recognition

Summary

References

Advanced Convolutional Neural Networks

Computer vision

Video

Textual documents

Audio and music

A summary of convolution operations

Capsule networks

Summary

References

Generative Adversarial Networks

What is a GAN?

Deep convolutional GAN (DCGAN)

Some interesting GAN architectures

Cool applications of GANs

CycleGAN in TensorFlow 2.0

Summary

References

Word Embeddings

Word embedding ‒ origins and fundamentals

Distributed representations

Static embeddings

Creating your own embedding using gensim

Exploring the embedding space with gensim

Using word embeddings for spam detection

Neural embeddings – not just for words

Character and subword embeddings

Dynamic embeddings

Sentence and paragraph embeddings

Language model-based embeddings

Summary

References

Recurrent Neural Networks

Encoder-Decoder architecture – seq2seq

Attention mechanism

Transformer architecture

Summary

References

Autoencoders

Introduction to autoencoders

Vanilla autoencoders

Sparse autoencoder

Denoising autoencoders

Stacked autoencoder

Summary

References

Unsupervised Learning

Principal component analysis

Self-organizing maps

Restricted Boltzmann machines

Variational Autoencoders

Summary

References

Reinforcement Learning

Introduction

Introduction to OpenAI Gym

Deep Q-Networks

Deep deterministic policy gradient

Summary

References

TensorFlow and Cloud

Deep learning on cloud

Virtual machines on cloud

Jupyter Notebooks on cloud

TensorFlow Extended for production

TensorFlow Enterprise

Summary

References

TensorFlow for Mobile and IoT and TensorFlow.js

TensorFlow Mobile

TensorFlow Lite

Pretrained models in TensorFlow Lite

An overview of federated learning at the edge

TensorFlow.js

Summary

References

An introduction to AutoML

What is AutoML?

Achieving AutoML

Automatic data preparation

Automatic feature engineering

Automatic model generation

AutoKeras

Google Cloud AutoML

Bringing Google AutoML to Kaggle

Summary

References

The Math Behind Deep Learning

History

Some mathematical tools

Activation functions

Backpropagation

Thinking about backpropagation and convnets

Thinking about backpropagation and RNNs

A note on TensorFlow and automatic differentiation

Summary

References

Tensor Processing Unit

C/G/T processing units

Three generations of TPUs and Edge TPU

TPU performance

How to use TPUs with Colab

Using pretrained TPU models

Using TensorFlow 2.1 and nightly build

Summary

References

Other Books You May Enjoy

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Multi-layer perceptron – our first example of a network

In this chapter, we present our first example of a network with multiple dense layers. Historically, "perceptron" was the name given to a model having one single linear layer, and as a consequence, if it has multiple layers, you would call it a multi-layer perceptron (MLP). Note that the input and the output layers are visible from outside, while all the other layers in the middle are hidden – hence the name hidden layers. In this context, a single layer is simply a linear function and the MLP is therefore obtained by stacking multiple single layers one after the other:

Figure 4: An example of a multiple layer perceptron

In Figure 4 each node in the first hidden layer receives an input and "fires" (0,1) according to the values of the associated linear function. Then, the output of the first hidden layer is passed to the second layer where another linear function is applied, the results of which are passed to the final output layer consisting of one single neuron. It is interesting to note that this layered organization vaguely resembles the organization of the human vision system, as we discussed earlier.

Problems in training the perceptron and their solutions

Let's consider a single neuron; what are the best choices for the weight w and the bias b? Ideally, we would like to provide a set of training examples and let the computer adjust the weight and the bias in such a way that the errors produced in the output are minimized.

In order to make this a bit more concrete, let's suppose that we have a set of images of cats and another separate set of images not containing cats. Suppose that each neuron receives input from the value of a single pixel in the images. While the computer processes those images, we would like our neuron to adjust its weights and its bias so that we have fewer and fewer images wrongly recognized.

This approach seems very intuitive, but it requires a small change in the weights (or the bias) to cause only a small change in the outputs. Think about it: if we have a big output jump, we cannot learn progressively. After all, kids learn little by little. Unfortunately, the perceptron does not show this "little-by-little" behavior. A perceptron is either a 0 or 1, and that's a big jump that will not help in learning (see Figure 5):

Figure 5: Example of perceptron - either a 0 or 1

We need something different; something smoother. We need a function that progressively changes from 0 to 1 with no discontinuity. Mathematically, this means that we need a continuous function that allows us to compute the derivative. You might remember that in mathematics the derivative is the amount by which a function changes at a given point. For functions with input given by real numbers, the derivative is the slope of the tangent line at a point on a graph. Later in this chapter, we will see why derivatives are important for learning, when we talk about gradient descent.

Activation function – sigmoid

The sigmoid function defined as and represented in the following figure has small output changes in the range (0, 1) when the input varies in the range . Mathematically the function is continuous. A typical sigmoid function is represented in Figure 6:

Figure 6: A sigmoid function with output in the range (0,1)

A neuron can use the sigmoid for computing the nonlinear function . Note that if z = wx + b is very large and positive, then so , while if z = wx + b is very large and negative so . In other words, a neuron with sigmoid activation has a behavior similar to the perceptron, but the changes are gradual and output values such as 0.5539 or 0.123191 are perfectly legitimate. In this sense, a sigmoid neuron can answer "maybe."

Activation function – tanh

Another useful activation function is tanh. Defined as whose shape is shown in Figure 7, its outputs range from -1 to 1:

Figure 7: Tanh activation function

Activation function – ReLU

The sigmoid is not the only kind of smooth activation function used for neural networks. Recently, a very simple function named ReLU (REctified Linear Unit) became very popular because it helps address some optimization problems observed with sigmoids. We will discuss these problems in more detail when we talk about vanishing gradient in Chapter 9, Autoencoders. A ReLU is simply defined as f(x) = max(0, x) and the non-linear function is represented in Figure 8. As you can see, the function is zero for negative values and it grows linearly for positive values. The ReLU is also very simple to implement (generally, three instructions are enough), while the sigmoid is a few orders of magnitude more. This helped to squeeze the neural networks onto an early GPU:

Figure 8: A ReLU function

Two additional activation functions – ELU and LeakyReLU

Sigmoid and ReLU are not the only activation functions used for learning.

ELU is defined as for and its plot is represented in Figure 9:

Figure 9: An ELU function

LeakyReLU is defined as for and its plot is represented in Figure 10:

Figure 10: A LeakyReLU function

Both the functions allow small updates if x is negative, which might be useful in certain conditions.

Activation functions

Sigmoid, Tanh, ELU, LeakyReLU, and ReLU are generally called activation functions in neural network jargon. In the gradient descent section, we will see that those gradual changes typical of sigmoid and ReLU functions are the basic building blocks to develop a learning algorithm that adapts little by little by progressively reducing the mistakes made by our nets. An example of using the activation function with (x₁, x₂,..., x_m) input vector, (w₁, w₂,..., w_m) weight vector, b bias, and summation is given in Figure 11. Note that TensorFlow 2.0 supports many activation functions, a full list of which is available online:

Figure 11: An example of an activation function applied after a linear function

In short – what are neural networks after all?

In one sentence, machine learning models are a way to compute a function that maps some inputs to their corresponding outputs. The function is nothing more than a number of addition and multiplication operations. However, when combined with a non-linear activation and stacked in multiple layers, these functions can learn almost anything [8]. You also need a meaningful metric capturing what you want to optimize (this being the so-called loss function that we will cover later in the book), enough data to learn from, and sufficient computational power.

Now, it might be beneficial to stop one moment and ask ourselves what "learning" really is? Well, we can say for our purposes that learning is essentially a process aimed at generalizing established observations [9] in order to predict future results. So, in short, this is exactly the goal we want to achieve with neural networks.

Deep Learning with TensorFlow 2 and Keras - Second Edition

By : Antonio Gulli, Amita Kapoor, Sujit Pal

Deep Learning with TensorFlow 2 and Keras - Second Edition

By: Antonio Gulli, Amita Kapoor, Sujit Pal

Overview of this book

Related Content you might be interested in

Current Title:

Deep Learning with TensorFlow 2 and Keras - Second Edition

Deep Learning with Keras

TensorFlow 1.x Deep Learning Cookbook

Generative AI with Python and TensorFlow 2

Multi-layer perceptron – our first example of a network

Problems in training the perceptron and their solutions

Activation function – sigmoid

Activation function – tanh

Activation function – ReLU

Two additional activation functions – ELU and LeakyReLU

Activation functions

In short – what are neural networks after all?