Book Image

Mastering PyTorch - Second Edition

By : Ashish Ranjan Jha
4 (1)
Book Image

Mastering PyTorch - Second Edition

4 (1)
By: Ashish Ranjan Jha

Overview of this book

PyTorch is making it easier than ever before for anyone to build deep learning applications. This PyTorch deep learning book will help you uncover expert techniques to get the most out of your data and build complex neural network models. You’ll build convolutional neural networks for image classification and recurrent neural networks and transformers for sentiment analysis. As you advance, you'll apply deep learning across different domains, such as music, text, and image generation, using generative models, including diffusion models. You'll not only build and train your own deep reinforcement learning models in PyTorch but also learn to optimize model training using multiple CPUs, GPUs, and mixed-precision training. You’ll deploy PyTorch models to production, including mobile devices. Finally, you’ll discover the PyTorch ecosystem and its rich set of libraries. These libraries will add another set of tools to your deep learning toolbelt, teaching you how to use fastai to prototype models and PyTorch Lightning to train models. You’ll discover libraries for AutoML and explainable AI (XAI), create recommendation systems, and build language and vision transformers with Hugging Face. By the end of this book, you'll be able to perform complex deep learning tasks using PyTorch to build smart artificial intelligence models.
Table of Contents (21 chapters)
20
Index

Why are CNNs so powerful?

CNNs are among the most powerful machine learning models at solving challenging problems such as image classification, object detection, object segmentation, video processing, natural language processing, and speech recognition. Their success is attributed to various factors, such as the following:

  • Weight sharing: This makes CNNs parameter-efficient; that is, different features are extracted using the same set of weights or parameters. Features are the high-level representations of input data that the model generates with its parameters.
  • Automatic feature extraction: Multiple feature extraction stages help a CNN to automatically learn feature representations in a dataset.
  • Hierarchical learning: The multi-layered CNN structure helps CNNs to learn low-, mid-, and high-level features.
  • The ability to explore both spatial and temporal correlations in the data, such as in video-processing tasks.

Besides these pre-existing fundamental characteristics, CNNs have advanced over the years with the help of improvements in the following areas:

  • The use of better activation and loss functions, such as using ReLU to overcome the vanishing gradient problem.
  • Parameter optimization, such as using an optimizer based on Adaptive Momentum (Adam) instead of simple stochastic gradient descent.
  • Regularization: Applying dropouts and batch normalization besides L2 regularization.

FAQ – What is the vanishing gradient problem?

Backpropagation in neural networks works on the basis of the chain rule of differentiation. According to the chain rule, the gradient of the loss function with respect to the input layer parameters can be written as a product of gradients at each layer. If these gradients are all less than 1 – and worse still, tending toward 0 – then the product of these gradients will be a vanishingly small value. The vanishing gradient problem can cause serious trouble in the optimization process by preventing the network parameters from changing their values, which is equivalent to stunted learning.

But some of the most significant drivers of development in CNNs over the years have been the various architectural innovations:

  • Spatial exploration-based CNNs: The idea behind spatial exploration is using different kernel sizes in order to explore different levels of visual features in input data. The following diagram shows a sample architecture for a spatial exploration-based CNN model:
Figure 3.1 – Spatial exploration-based CNN

Figure 2.1: Spatial exploration-based CNN

  • Depth-based CNNs: The depth here refers to the depth of the neural network, that is, the number of layers. So, the idea here is to create a CNN model with multiple convolutional layers in order to extract highly complex visual features. The following diagram shows an example of such a model architecture:
Figure 3.2 – Depth-based CNN

Figure 2.2: Depth-based CNN

  • Width-based CNNs: Width refers to the number of channels or feature maps in the data or features extracted from the data. So, width-based CNNs are all about increasing the number of feature maps as we go from the input to the output layers, as demonstrated in the following diagram:
Figure 3.3 – Width-based CNN

Figure 2.3: Width-based CNN

  • Multi-path-based CNNs: So far, the preceding three types of architectures have had monotonicity in connections between layers; that is, direct connections exist only between consecutive layers. Multi-path CNNs brought the idea of making shortcut connections or skip connections between non-consecutive layers. The following diagram shows an example of a multi-path CNN model architecture:
Figure 3.4 – Multi-path CNN

Figure 2.4: Multi-path CNN

A key advantage of multi-path architectures is a better flow of information across several layers, thanks to the skip connections. This, in turn, also lets the gradient flow back to the input layers without too much dissipation.

Having looked at the different architectural setups found in CNN models, we will now look at how CNNs have evolved over the years ever since they were first used.