Book Image

Mastering PyTorch - Second Edition

By : Ashish Ranjan Jha
4 (1)
Book Image

Mastering PyTorch - Second Edition

4 (1)
By: Ashish Ranjan Jha

Overview of this book

PyTorch is making it easier than ever before for anyone to build deep learning applications. This PyTorch deep learning book will help you uncover expert techniques to get the most out of your data and build complex neural network models. You’ll build convolutional neural networks for image classification and recurrent neural networks and transformers for sentiment analysis. As you advance, you'll apply deep learning across different domains, such as music, text, and image generation, using generative models, including diffusion models. You'll not only build and train your own deep reinforcement learning models in PyTorch but also learn to optimize model training using multiple CPUs, GPUs, and mixed-precision training. You’ll deploy PyTorch models to production, including mobile devices. Finally, you’ll discover the PyTorch ecosystem and its rich set of libraries. These libraries will add another set of tools to your deep learning toolbelt, teaching you how to use fastai to prototype models and PyTorch Lightning to train models. You’ll discover libraries for AutoML and explainable AI (XAI), create recommendation systems, and build language and vision transformers with Hugging Face. By the end of this book, you'll be able to perform complex deep learning tasks using PyTorch to build smart artificial intelligence models.
Table of Contents (21 chapters)
20
Index

Discussing ResNet and DenseNet architectures

In the previous section, we explored Inception models, which have a reduced number of model parameters as the number of layers increased, thanks to the 1x1 convolutions and global average pooling. Furthermore, auxiliary classifiers were used to combat the vanishing gradient problem. In this section, we will discuss ResNet and DenseNet models.

ResNet

ResNet introduced the concept of skip connections. This simple yet effective trick overcomes the problem of both parameter overflow and vanishing gradients. The idea, as shown in the following diagram, is quite simple. The input is first passed through a non-linear transformation (convolutions followed by non-linear activations) and then the output of this transformation (referred to as the residual) is added to the original input.

Each block of such computation is called a residual block, hence the name of the model – residual network or ResNet:

Figure 3.26 – Skip connections

Figure 2.16: Skip connections

Using these skip (or shortcut) connections, the number of parameters is limited to 26 million parameters for a total of 50 layers (ResNet-50). Due to the limited number of parameters, ResNet has been able to generalize well without overfitting even when the number of layers is increased to 152 (ResNet-152). The following diagram shows the ResNet-50 architecture:

Figure 3.27 – ResNet architecture

Figure 2.17: ResNet architecture

There are two kinds of residual blocks – convolutional and identity, both having skip connections. For the convolutional block, there is an added 1x1 convolutional layer, which further helps to reduce dimensionality. A residual block for ResNet can be implemented in PyTorch as shown here:

class BasicBlock(nn.Module):
    multiplier=1
    def __init__(self, input_num_planes, num_planes, strd=1):
        super(BasicBlock, self).__init__()
        self.conv_layer1 = nn.Conv2d(in_channels=input_num_planes, 
                                     out_channels=num_planes, 
                                     kernel_size=3, 
                                     stride=stride, padding=1, 
                                     bias=False)
        self.batch_norm1 = nn.BatchNorm2d(num_planes)
        self.conv_layer2 = nn.Conv2d(in_channels=num_planes, 
                                     out_channels=num_planes, 
                                     kernel_size=3, stride=1, 
                                     padding=1, bias=False)
        self.batch_norm2 = nn.BatchNorm2d(num_planes)
        self.res_connnection = nn.Sequential()
        if strd > 1 or input_num_planes != self.multiplier*num_planes:
            self.res_connnection = nn.Sequential(
                nn.Conv2d(in_channels=input_num_planes, 
                          out_channels=self.multiplier*num_planes, 
                          kernel_size=1, stride=strd, bias=False),
                nn.BatchNorm2d(self.multiplier*num_planes))
    def forward(self, inp):
        op = F.relu(self.batch_norm1(self.conv_layer1(inp)))
        op = self.batch_norm2(self.conv_layer2(op))
        op += self.res_connnection(inp)
        op = F.relu(op)
        return op

To get started quickly with ResNet, we can always use the pretrained ResNet model from PyTorch’s repository:

import torchvision.models as models
model = models.resnet50(pretrained=True)

ResNet uses the identity function (by directly connecting input to output) to preserve the gradient during backpropagation (as the gradient will be 1). Yet, for extremely deep networks, this principle might not be sufficient to preserve strong gradients from the output layer back to the input layer.

The CNN model we will discuss next is designed to ensure a strong gradient flow, as well as a further reduction in the number of required parameters.

DenseNet

The skip connections of ResNet connected the input of a residual block directly to its output. However, the inter-residual-blocks connection is still sequential; that is, residual block number 3 has a direct connection with block 2 but no direct connection with block 1.

DenseNet, or dense networks, introduced the idea of connecting every convolutional layer with every other layer within what is called a dense block. And every dense block is connected to every other dense block in the overall DenseNet. A dense block is simply a module of two 3x3 densely connected convolutional layers.

These dense connections ensure that every layer is receiving information from all of the preceding layers of the network. This ensures that there is a strong gradient flow from the last layer down to the very first layer. Counterintuitively, the number of parameters of such a network setting will also be low. As every layer receives the feature maps from all the previous layers, the required number of channels (depth) can be fewer. In the earlier models, the increasing depth represented the accumulation of information from earlier layers, but we don’t need that anymore, thanks to the dense connections everywhere in the network.

One key difference between ResNet and DenseNet is also that, in ResNet, the input was added to the output using skip connections. But in the case of DenseNet, the preceding layers’ outputs are concatenated with the current layer’s output. And the concatenation happens in the depth dimension.

This might raise a question about the exploding size of outputs as we proceed further in the network. To combat this compounding effect, a special type of block called the transition block is devised for this network. Composed of a 1x1 convolutional layer followed by a 2x2 pooling layer, this block standardizes or resets the size of the depth dimension so that the output of this block can then be fed to the subsequent dense block(s).

The following diagram shows the DenseNet architecture:

Figure 3.28 – DenseNet architecture

Figure 3.28 – DenseNet architecture

Figure 2.18: DenseNet architecture

As mentioned earlier, there are two types of blocks involved – the dense block and the transition block. These blocks can be written as classes in PyTorch in a few lines of code, as shown here:

class DenseBlock(nn.Module):
    def __init__(self, input_num_planes, rate_inc):
        super(DenseBlock, self).__init__()
        self.batch_norm1 = nn.BatchNorm2d(input_num_planes)
        self.conv_layer1 = nn.Conv2d(in_channels=input_num_planes, 
                                     out_channels=4*rate_inc, 
                                     kernel_size=1, bias=False)
        self.batch_norm2 = nn.BatchNorm2d(4*rate_inc)
        self.conv_layer2 = nn.Conv2d(in_channels=4*rate_inc, 
                                     out_channels=rate_inc, 
                                     kernel_size=3, padding=1, 
                                     bias=False)
    def forward(self, inp):
        op = self.conv_layer1(F.relu(self.batch_norm1(inp)))
        op = self.conv_layer2(F.relu(self.batch_norm2(op)))
        op = torch.cat([op,inp], 1)
        return op
class TransBlock(nn.Module):
    def __init__(self, input_num_planes, output_num_planes):
        super(TransBlock, self).__init__()
        self.batch_norm = nn.BatchNorm2d(input_num_planes)
        self.conv_layer = nn.Conv2d(in_channels=input_num_planes, 
                                    out_channels=output_num_planes, 
                                    kernel_size=1, bias=False)
    def forward(self, inp):
        op = self.conv_layer(F.relu(self.batch_norm(inp)))
        op = F.avg_pool2d(op, 2)
        return op

These blocks are then stacked densely to form the overall DenseNet architecture. DenseNet, like ResNet, comes in variants, such as DenseNet121, DenseNet161, DenseNet169, and DenseNet201, where the numbers represent the total number of layers. Such large numbers of layers are obtained by the repeated stacking of the dense and transition blocks plus a fixed 7x7 convolutional layer at the input end and a fixed fully connected layer at the output end. PyTorch provides pretrained models for all of these variants:

import torchvision.models as models
densenet121 = models.densenet121(pretrained=True)
densenet161 = models.densenet161(pretrained=True)
densenet169 = models.densenet169(pretrained=True)
densenet201 = models.densenet201(pretrained=True)

DenseNet outperforms all the models discussed so far on the ImageNet dataset. Various hybrid models have been developed by mixing and matching the ideas presented in the previous sections. The Inception-ResNet and ResNeXt models are examples of such hybrid networks. The following diagram shows the ResNeXt architecture:

Figure 3.29 – ResNeXt architecture
Figure 3.29 – ResNeXt architecture

Figure 2.19 – ResNeXt architecture

As you can see, it looks like a wider variant of a ResNet + Inception hybrid because there is a large number of parallel convolutional branches in the residual blocks – and the idea of parallelism is derived from the inception network.

In the next and last section of this chapter, we are going to look at the one of the best-performing CNN architectures till date – EfficientNets. We will also discuss the future of CNN architectural development while touching upon the use of CNN architectures for tasks beyond image classification.