Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Modern Computer Vision with PyTorch
  • Table Of Contents Toc
Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch - Second Edition

By : V Kishore Ayyadevara, Yeshwanth Reddy
4 (21)
close
close
Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

4 (21)
By: V Kishore Ayyadevara, Yeshwanth Reddy

Overview of this book

Whether you are a beginner or are looking to progress in your computer vision career, this book guides you through the fundamentals of neural networks (NNs) and PyTorch and how to implement state-of-the-art architectures for real-world tasks. The second edition of Modern Computer Vision with PyTorch is fully updated to explain and provide practical examples of the latest multimodal models, CLIP, and Stable Diffusion. You’ll discover best practices for working with images, tweaking hyperparameters, and moving models into production. As you progress, you'll implement various use cases for facial keypoint recognition, multi-object detection, segmentation, and human pose detection. This book provides a solid foundation in image generation as you explore different GAN architectures. You’ll leverage transformer-based architectures like ViT, TrOCR, BLIP2, and LayoutLM to perform various real-world tasks and build a diffusion model from scratch. Additionally, you’ll utilize foundation models' capabilities to perform zero-shot object detection and image segmentation. Finally, you’ll learn best practices for deploying a model to production. By the end of this deep learning book, you'll confidently leverage modern NN architectures to solve real-world computer vision problems.
Table of Contents (27 chapters)
close
close
1
Section 1: Fundamentals of Deep Learning for Computer Vision
5
Section 2: Object Classification and Detection
13
Section 3: Image Manipulation
17
Section 4: Combining Computer Vision with Other Techniques
24
Other Books You May Enjoy
25
Index

Exploring the Mask R-CNN architecture

The Mask R-CNN architecture helps identify/highlight the instances of objects of a given class within an image. This becomes especially handy when there are multiple objects of the same type present within the image. Furthermore, the term Mask represents the segmentation that’s done at the pixel level by Mask R-CNN.

The Mask R-CNN architecture is an extension of the Faster R-CNN network, which we learned about in the previous chapter. However, a few modifications have been made to the Mask R-CNN architecture, as follows:

  • The RoI Pooling layer has been replaced with the RoI Align layer.
  • A mask head has been included to predict a mask of objects in addition to the head, which already predicts the classes of objects and bounding-box correction in the final layer.
  • A fully convolutional network (FCN) is leveraged for mask prediction.

Let’s have a quick look at the events that occur within Mask R-CNN...

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Modern Computer Vision with PyTorch
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon