Intelligent Projects Using Python

By : Santanu Pattanayak

Intelligent Projects Using Python

By: Santanu Pattanayak

Overview of this book

This book will be a perfect companion if you want to build insightful projects from leading AI domains using Python. The book covers detailed implementation of projects from all the core disciplines of AI. We start by covering the basics of how to create smart systems using machine learning and deep learning techniques. You will assimilate various neural network architectures such as CNN, RNN, LSTM, to solve critical new world challenges. You will learn to train a model to detect diabetic retinopathy conditions in the human eye and create an intelligent system for performing a video-to-text translation. You will use the transfer learning technique in the healthcare domain and implement style transfer using GANs. Later you will learn to build AI-based recommendation systems, a mobile app for sentiment analysis and a powerful chatbot for carrying customer services. You will implement AI techniques in the cybersecurity domain to generate Captchas. Later you will train and build autonomous vehicles to self-drive using reinforcement learning. You will be using libraries from the Python ecosystem such as TensorFlow, Keras and more to bring the core aspects of machine learning, deep learning, and AI. By the end of this book, you will be skilled to build your own smart models for tackling any kind of AI problems without any hassle.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Foundations of Artificial Intelligence Based Systems

Neural networks

Neural activation units

The backpropagation method of training neural networks

Convolutional neural networks

Recurrent neural networks (RNNs)

Generative adversarial networks

Reinforcement learning

Transfer learning

Restricted Boltzmann machines

Autoencoders

Summary

Transfer Learning

Technical requirements

Introduction to transfer learning

Transfer learning and detecting diabetic retinopathy

The diabetic retinopathy dataset

Formulating the loss function

Taking class imbalances into account

Preprocessing the images

Additional data generation using affine transformation

Network architecture

The optimizer and initial learning rate

Cross-validation

Model checkpoints based on validation log loss

Python implementation of the training process

Results from the categorical classification

Inference at testing time

Performing regression instead of categorical classification

Using the keras sequential utils as generator

Summary

Neural Machine Translation

Technical requirements

Rule-based machine translation

Statistical machine-learning systems

Neural machine translation

Implementing a sequence-to-sequence neural translation machine

Summary

Style Transfer in Fashion Industry using GANs

Technical requirements

DiscoGAN

CycleGAN

Learning to generate natural handbags from sketched outlines

Preprocess the Images

The generators of the DiscoGAN

The discriminators of the DiscoGAN

Building the network and defining the cost functions

Building the training process

Important parameter values for GAN training

Invoking the training

Monitoring the generator and the discriminator loss

Sample images generated by DiscoGAN

Summary

Video Captioning Application

Technical requirements

CNNs and LSTMs in video captioning

A sequence-to-sequence video-captioning system

Data for the video-captioning system

Processing video images to create CNN features

Processing the labelled captions of the video

Building the train and test dataset

Building the model

Creating a word vocabulary for the captions

Training the model

Training results

Inference with unseen test videos

Summary

The Intelligent Recommender System

Technical requirements

What is a recommender system?

Latent factorization-based recommendation system

Deep learning for latent factor collaborative filtering

SVD++

Restricted Boltzmann machines for recommendation

Contrastive divergence

Collaborative filtering using RBMs

Collaborative filtering implementation using RBM

Inference using the trained RBM

Summary

Mobile App for Movie Review Sentiment Analysis

Technical requirements

Building an Android mobile app using TensorFlow mobile

Movie review rating in an Android app

Preprocessing the movie review text

Building the model

Training the model

Freezing the model to a protobuf format

Creating a word-to-token dictionary for inference

App interface page design

The core logic of the Android app

Testing the mobile app

Summary

Conversational AI Chatbots for Customer Service

Technical requirements

Chatbot architecture

A sequence-to-sequence model using an LSTM

Building a sequence-to-sequence model

Customer support on Twitter

Summary

Autonomous Self-Driving Car Through Reinforcement Learning

Technical requirements

Markov decision process

Learning the Q value function

Deep Q learning

Formulating the cost function

Double deep Q learning

Implementing an autonomous self-driving car

Discretizing actions for deep Q learning

Implementing the Double Deep Q network

Designing the agent

The environment for the self-driving car

Putting it all together

Results from the training

Summary

CAPTCHA from a Deep-Learning Perspective

Technical requirements

Breaking CAPTCHAs with deep learning

CAPTCHA generation through adversarial learning

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Convolutional neural networks

Convolutional neural networks (CNNs) utilize convolutional operations to extract useful information from data that has a topology associated with it. This works best for image and audio data. The input image, when passed through a convolution layer, produces several output images, known as output feature maps. The output feature maps detect features. The output feature maps in the initial convolutional layer may learn to detect basic features, such as edges and color composition variation.

The second convolutional layer may detect slightly more complicated features, such as squares, circles, and other geometrical structures. As we progress through the neural network, the convolutional layers learn to detect more and more complicated features. For instance, if we have a CNN that classifies whether an image is of a cat or a dog, the convolutional layers at the bottom of the neural network might learn to detect features such as the head, the legs, and so on.

Figure 1.11 shows an architectural diagram of a CNN that processes images of cats and dogs in order to classify them. The images are passed through a convolutional layer that helps to detect relevant features, such as edges and color composition. The ReLU activations add nonlinearity. The pooling layer that follows the activation layer summarizes local neighborhood information in order to provide an amount of translational invariance. In an ideal CNN, this convolution-activation-pooling operation is performed several times before the network makes its way to the dense connections:

Figure 1.11: CNN architecture

As we go through such a network with several convolution-activation-pooling operations, the spatial resolution of the image is reduced, while the number of output feature maps is increased in every layer. Each output feature map in a convolutional layer is associated with a filter kernel, the weights of which are learned through the CNN training process.

In a convolutional operation, a flipped version of a filter kernel is laid over the entire image or feature map, and the dot product of the filter-kernel input values with the corresponding image pixel or the feature map values are computed for each location on the input image or feature map. Readers that are already accustomed to ordinary image processing may have used different filter kernels, such as a Gaussian filter, a Sobel edge detection filter, and many more, where the weights of the filters are predefined. The advantage of convolutional neural networks is that the different filter weights are determined through the training process; This means that, the filters are better customized for the problem that the convolutional neural network is dealing with.

When a convolutional operation involves overlaying the filter kernel on every location of the input, the convolution is said to have a stride of one. If we choose to skip one location while overlaying the filter kernel, then convolution is performed with a stride of two. In general, if n locations are skipped while overlaying the filter kernel over the input, the convolution is said to have been performed with a stride of (n+1). Strides of greater than one reduce the spatial dimensions of the output of the convolution.

Generally, a convolutional layer is followed by a pooling layer, which basically summarizes the output feature map activations in a neighborhood, determined by the receptive field of the pooling. For instance, a 2 x 2 receptive field will gather the local information of four neighboring output feature map activations. For max-pooling operations, the maximum value of the four activations is selected as the output, while for average pooling, the average of the four activations is selected. Pooling reduces the spatial resolution of the feature maps. For instance, for a 224 x 224 sized feature map pooling operation with a 2 x 2 receptive field, the spatial dimension of the feature map will be reduced to 112 x 112.

One thing to note is that a convolutional operation reduces the number of weights to be learned in each layer. For instance, if we have an input image of a spatial dimension of 224 x 224 and the desired output of the next layer is of the dimensions 224 x 224, then for a traditional neural network with full connections, the number of weights to be learned is 224 x 224 x 224 x 224. For a convolutional layer with the same input and output dimensions, all that we need to learn are the weights of the filter kernel. So, if we use a 3 x 3 filter kernel, we just need to learn nine weights as opposed to 224 x 224 x 224 x 224 weights. This simplification works, since structures like images and audio in a local spatial neighborhood have high correlation among them.

The input images pass through several layers of convolutional and pooling operations. As the network progresses, the number of feature maps increases, while the spatial resolution of the images decreases. At the end of the convolutional-pooling layers, the output of the feature maps is fed to the fully connected layers, followed by the output layer.

The output units are dependent on the task at hand. If we are performing regression, the output activation unit is linear, while if it is a binary classification problem, the output unit is a sigmoid. For multi-class classification, the output layer is a softmax unit.

In all of the image processing projects in this book, we will use convolutional neural networks, in one form or another.

Intelligent Projects Using Python

By : Santanu Pattanayak

Intelligent Projects Using Python

By: Santanu Pattanayak

Overview of this book

Related Content you might be interested in

Current Title:

Intelligent Projects Using Python

Deep Learning for Natural Language Processing

Deep Learning Quick Reference

Hands-On Deep Learning Architectures with Python