Hands-On Transfer Learning with Python

By : Dipanjan Sarkar, Nitin Panwar, Raghav Bali, Tamoghna Ghosh

Hands-On Transfer Learning with Python

By: Dipanjan Sarkar, Nitin Panwar, Raghav Bali, Tamoghna Ghosh

Overview of this book

Transfer learning is a machine learning (ML) technique where knowledge gained during training a set of problems can be used to solve other similar problems. The purpose of this book is two-fold; firstly, we focus on detailed coverage of deep learning (DL) and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. The second area of focus is real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples. The book starts with the key essential concepts of ML and DL, followed by depiction and coverage of important DL architectures such as convolutional neural networks (CNNs), deep neural networks (DNNs), recurrent neural networks (RNNs), long short-term memory (LSTM), and capsule networks. Our focus then shifts to transfer learning concepts, such as model freezing, fine-tuning, pre-trained models including VGG, inception, ResNet, and how these systems perform better than DL models with practical examples. In the concluding chapters, we will focus on a multitude of real-world case studies and problems associated with areas such as computer vision, audio analysis and natural language processing (NLP). By the end of this book, you will be able to implement both DL and transfer learning principles in your own systems.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Machine Learning Fundamentals

Why ML?

ML techniques

CRISP-DM

Standard ML workflow

Exploratory data analysis

Feature extraction and engineering

Feature selection

Summary

Deep Learning Essentials

What is deep learning?

Deep learning frameworks

Setting up a cloud-based deep learning environment with GPU support

Setting up a robust, on-premise deep learning environment with GPU support

Neural network basics

Summary

Understanding Deep Learning Architectures

Neural network architecture

Various architectures

Summary

Transfer Learning Fundamentals

Introduction to transfer learning

Transfer learning strategies

Transfer learning and deep learning

Deep transfer learning types

Challenges of transfer learning

Summary

Unleashing the Power of Transfer Learning

The need for transfer learning

Building CNN models from scratch

Leveraging transfer learning with pretrained CNN models

Evaluating our deep learning models

Summary

Image Recognition and Classification

Deep learning-based image classification

Benchmarking datasets

State-of-the-art deep image classification models

Image classification and transfer learning

Summary

Text Document Categorization

Text categorization

Word representations

CNN document model

Summary

Audio Event Identification and Classification

Understanding audio event classification

Exploratory analysis of audio events

Feature engineering and representation of audio events

Audio event classification with transfer learning

Building a deep learning audio event identifier

Summary

DeepDream

Introduction

DeepDream

Summary

Style Transfer

Understanding neural style transfer

Image preprocessing methodology

Building loss functions

Constructing a custom optimizer

Style transfer in action

Summary

Automated Image Caption Generator

Understanding image captioning

Formulating our objective

Understanding the data

Approach to automated image captioning

Image feature extraction with transfer learning

Building a vocabulary for our captions

Building an image caption dataset generator

Building our image language encoder-decoder deep learning model

Training our image captioning deep learning model

Evaluating our image captioning deep learning model

Automated image captioning in action!

Summary

Image Colorization

Problem statement

Color images

Building a coloring deep neural network

Challenges

Further improvements

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

ML techniques

ML is a popular subfield of AI, one which covers a very wide scope. One of the reasons for this popularity is the comprehensive toolbox of sophisticated algorithms, techniques, and methodologies under its gambit. This toolbox has been developed and improved over the years, and new ones are being researched on an ongoing basis. To understand and use the ML toolbox wisely, consider the following few ways of categorizing it.

Categorization based on amount of human supervision:

Supervised learning: This class of learning involves high-human supervision. The algorithms under supervised learning utilize the training data and associated outputs to learn a mapping between the two and apply the same on unseen data. Classification and regression are two major types of supervised learning algorithms.
Unsupervised learning: This class of algorithms attempts to learn inherent latent structures, patterns, and relationships from the input data without any associated outputs/labels (human supervision). Clustering, dimensionality reduction, association rule mining, and so on are a few major types of unsupervised learning algorithms.
Semi-supervised learning: This class of algorithms is a hybrid of supervised and unsupervised learning. In this case, the algorithms work with small amounts of labeled training data and more of unlabeled data. Thus making a creative use of both supervised and unsupervised methods to solve a given task.
Reinforcement learning: This class of algorithms is a bit different from supervised and unsupervised learning methods. The central entity here is an agent, which trains over a period while interacting with its environment to maximize some reward/award. The agent iteratively learns and changes strategies/policies based on rewards/penalties from interacting with the environment.

Categorization based on data availability:

Batch learning: This is also termed as offline learning. This type of learning is utilized when the required training data is available, and a model can be trained and fine-tuned before deploying into production/real world.
Online learning: As the name suggests, in this case the learning is not stopped once the data is available. Rather, in this case, data is fed into the system in mini-batches and the training process continues with new batches of data.

The previously discussed categorizations give us an abstract view of how ML algorithms can be organized, understood, and utilized. The most common way to categorize them is into supervised and unsupervised learning algorithms. Let's go into a bit more detail about these two categories as this should help us get started for further advanced topics to be introduced later.

Supervised learning

Supervised learning algorithms are a class of algorithms that utilize data samples (also called training samples) and corresponding outputs (or labels) to infer a mapping function between the two. The inferred mapping function or the learned function is the output of this training process. The learned function is then utilized to correctly map new and unseen data points (input elements) to test the performance of the learned function.

Some key concepts for supervised learning algorithms are as follows:

Training dataset: The training samples and corresponding outputs utilized during the training process are termed as training data. Formally, a training dataset is a two-element tuple consisting of an input element (usually a vector) and a corresponding output element or signal.
Test dataset: The unseen dataset that is utilized to test the performance of the learned function. This dataset is also a two-element tuple containing input data points and corresponding output signals. Data points in this set are not used for the training phase (this dataset is further divided into the validation set as well; we will discuss this in more detail in subsequent chapters).
Learned function: This is the output of the training phase. Also termed as inferred function or the model. This function is inferred based on the training examples (input data points and their corresponding outputs) from the training dataset. An ideal model/learned function would learn the mapping in such a way that the results can be generalized for unseen data as well.

There are various supervised learning algorithms available. Based on the use case requirements, they can be majorly categorize into classification and regression models.

Classification

In the simplest terms, these algorithms help us answer objective questions or a yes-no prediction. For instance, these algorithms are useful in scenarios like is it going to rain today?, or can this tumour be cancerous?, and so on.

Formally, the key objective of classification algorithms is to predict output labels that are categorical in nature depending upon the input data points. The output labels are categorical in nature; namely, they each belong to a discrete class or category.

Logistic regression, Support Vector Machines (SVMs), Neural Networks, Random Forests, k-Nearest Neighbours (KNN), Decision Trees, and so on are some of the popular classification algorithms.

Suppose we have a real-world use case to evaluate different car models. To keep things simple, let's assume that the model is expected to predict an output for every car model as either acceptable or unacceptable based on multiple input training samples. The input training samples have attributes such as buying price, number of doors, capacity (in number of persons), and safety.

The level apart from the class label denotes each data point as either acceptable or unacceptable. The following diagram depicts the binary classification problem at hand. The classification algorithm takes the training samples as input to prepare a supervised model. This model is then utilized to predict the evaluation label for a new data point:

Supervised learning: Binary classification for car model evaluation

Since output labels are discrete classes in case of classification problems, if there are only two possible output classes the task is termed as a binary classification problem, and a multi-class classification otherwise. Predicting whether it will rain tomorrow or not would be a binary classification problem (with output being a yes or a no) while predicting a numeric digit from scanned handwritten images would be multi-class classification with 10 labels (zero to nine possible output labels).

Regression

This class of supervised learning algorithms helps us answer quantitative questions of the type how many or how much?. Formally, the key objective for regression models is value estimation. In this case, the output labels are continuous in nature (as opposed to being discrete in classification).

In the case of regression problems, the input data points are termed as independent or explanatory variables, while the output is termed as a dependent variable. Regression models are also trained using training data samples consisting of input (or independent) data points along with output (or dependent) signals. Linear regression, multivariate regression, regression trees, and so on are a few supervised regression algorithms.

Regression models can be further categorized based on how they model the relationship between dependent and independent variables.

Simple linear regression models work with single independent and single dependent variables. Ordinary Least Squares (OLS) regression is a popular linear regression model. Multiple regression or multivariate regression is where there is a single dependent variable, while each observation is a vector composed of multiple explanatory variables.

Polynomial regression models are a special case of multivariate regression. Here the dependent variable is modeled to the n^th degree of the independent variable. Since polynomial regression models fit or map nonlinear relationships between dependent and independent variables, these are also termed as nonlinear regression models.

The following is an example of linear regression:

Supervised learning: Linear regression

To understand different regression types, let's consider a real-world use case of estimating the stopping distance of a car, based on its speed. Here, based on the training data we have, we can model the stopping distance as a linear function of speed or as a polynomial function of the speed of the car. Remember that the main objective is to minimize the error without overfitting the training data itself.

The preceding graph depicts a linear fit while the following one depicts a polynomial fit for the the same dataset:

Supervised learning: Polynomial regression

Unsupervised learning

As the name suggests, this class of algorithms learns/infers concepts without supervision. Unlike supervised learning algorithms, which infer a mapping function based on training dataset consisting of input data points and output signals, unsupervised algorithms are tasked with finding patterns and relationships in the training data without any output signals available in the training dataset. This class of algorithms utilizes the input dataset to detect patterns, and mine for rules or group/cluster data points so as to extract meaningful insights from the raw input dataset.

Unsupervised algorithms come in handy when we do not have the liberty of a training set that contains corresponding output signals or labels. In many real-world scenarios, datasets are available without output signals and it is difficult to manually label them. Thus, unsupervised algorithms are helpful in plugging such gaps.

Similar to supervised learning algorithms, unsupervised algorithms can also be categorized for ease of understanding and learning. The following are different categories of unsupervised learning algorithms.

Clustering

The unsupervised equivalent of classification is termed as clustering. These algorithms help us cluster or group data points into different groups or categories, without the availability of any output label in the input/training dataset. These algorithms try to find patterns and relationships from the input dataset, utilizing inherent features to group them into various groups based on some similarity measure, as shown in the following diagram:

Unsupervised learning: Clustering news articles

A real-world example to help understand clustering could be news articles. There are hundreds of news articles written daily, each catering to different topics ranging from politics and sports to entertainment, and so on. An unsupervised approach to group these articles together can be achieved using clustering, as shown in the preceding figure.

There are different approaches to perform the process of clustering. The most popular ones are:

Centroid based methods. Popular ones are K-means and K-medoids.
Agglomerative and divisive hierarchical clustering methods. Popular ones are Ward's and affinity propagation.
Data distribution based methods, for instance, Gaussian mixture models.
Density based methods such as DBSCAN and so on.

Dimensionality reduction

Data and ML are the best of friends, yet a lot of issues come with more and bigger data. A large number of attributes or a bloated-up feature space is one common problem. A large feature space poses problems in analyzing and visualizing the data along with issues related to training, memory, and space constraints. This is also known as the curse of dimensionality. Since unsupervised methods help us extract insights and patterns from unlabeled training datasets, they are also useful in helping us reduce dimensionality.

In other words, unsupervised methods help us reduce feature space by helping us select a representative set of features from the complete available list:

Unsupervised learning: Dimensionality reduction using PCA

Principal Component Analysis (PCA), nearest neighbors, and discriminant analysis are some of the popular dimensionality reduction techniques.

The preceding diagram is a famous depiction of the workings of the PCA based dimensionality reduction technique. It shows a swiss roll shape with data represented in three-dimensional space. Application of PCA results in transformation of the data into two-dimensional space, as shown on the right-hand side of the diagram.

Association rule mining

This class of unsupervised ML algorithms helps us understand and extract patterns from transactional datasets. Also termed as Market Basket Analysis (MBA), these algorithms help us identify interesting relationships and associations between items across transactions.

Using association rule mining, we can answer questions like what items are bought together by people at a given store?, or do people who buy wine also tend to buy cheese?, and many more. FP-growth, ECLAT, and Apriori are some of the most widely used algorithms for association rule mining tasks.

Anomaly detection

Anomaly detection is the task of identifying rare events/observations based on historical data. Anomaly detection is also termed as outlier detection. Anomalies or outliers usually have characteristics such as being infrequent or occurring in short sudden bursts over time.

For such tasks, we provide a historical dataset for the algorithm so it can identify and learn the normal behavior of data in an unsupervised manner. Once learned, the algorithm helps us identify patterns that differ from this learned behavior.

Hands-On Transfer Learning with Python

By : Dipanjan Sarkar, Nitin Panwar, Raghav Bali, Tamoghna Ghosh

Hands-On Transfer Learning with Python

By: Dipanjan Sarkar, Nitin Panwar, Raghav Bali, Tamoghna Ghosh

Overview of this book

Related Content you might be interested in

Current Title:

Hands-On Transfer Learning with Python

Deep Learning Essentials

Deep Learning with Keras

Python Deep Learning Cookbook