Book Image

Hands-On Deep Learning Algorithms with Python

By : Sudharsan Ravichandiran
Book Image

Hands-On Deep Learning Algorithms with Python

By: Sudharsan Ravichandiran

Overview of this book

Deep learning is one of the most popular domains in the AI space that allows you to develop multi-layered models of varying complexities. This book introduces you to popular deep learning algorithms—from basic to advanced—and shows you how to implement them from scratch using TensorFlow. Throughout the book, you will gain insights into each algorithm, the mathematical principles involved, and how to implement it in the best possible manner. The book starts by explaining how you can build your own neural networks, followed by introducing you to TensorFlow, the powerful Python-based library for machine learning and deep learning. Moving on, you will get up to speed with gradient descent variants, such as NAG, AMSGrad, AdaDelta, Adam, and Nadam. The book will then provide you with insights into recurrent neural networks (RNNs) and LSTM and how to generate song lyrics with RNN. Next, you will master the math necessary to work with convolutional and capsule networks, widely used for image recognition tasks. You will also learn how machines understand the semantics of words and documents using CBOW, skip-gram, and PV-DM. Finally, you will explore GANs, including InfoGAN and LSGAN, and autoencoders, such as contractive autoencoders and VAE. By the end of this book, you will be equipped with all the skills you need to implement deep learning in your own projects.
Table of Contents (17 chapters)
Free Chapter
1
Section 1: Getting Started with Deep Learning
4
Section 2: Fundamental Deep Learning Algorithms
10
Section 3: Advanced Deep Learning Algorithms

What are CNNs?

A CNN, also known as a ConvNet, is one of the most widely used deep learning algorithms for computer vision tasks. Let's say we are performing an image-recognition task. Consider the following image. We want our CNN to recognize that it contains a horse:

How can we do that? When we feed the image to a computer, it basically converts it into a matrix of pixel values. The pixel values range from 0 to 255, and the dimensions of this matrix will be of [image width x image height x number of channels]. A grayscale image has one channel, and colored images have three channels red, green, and blue (RGB).

Let's say we have a colored input image with a width of 11 and a height of 11, that is 11 x 11, then our matrix dimension would be of [11 x 11 x 3]. As you can see in [11 x 11 x 3], 11 x 11 represents the image width and height and 3 represents the channel number...