Book Image

Hands-On Transfer Learning with Python

By : Dipanjan Sarkar, Nitin Panwar, Raghav Bali, Tamoghna Ghosh
Book Image

Hands-On Transfer Learning with Python

By: Dipanjan Sarkar, Nitin Panwar, Raghav Bali, Tamoghna Ghosh

Overview of this book

Transfer learning is a machine learning (ML) technique where knowledge gained during training a set of problems can be used to solve other similar problems. The purpose of this book is two-fold; firstly, we focus on detailed coverage of deep learning (DL) and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. The second area of focus is real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples. The book starts with the key essential concepts of ML and DL, followed by depiction and coverage of important DL architectures such as convolutional neural networks (CNNs), deep neural networks (DNNs), recurrent neural networks (RNNs), long short-term memory (LSTM), and capsule networks. Our focus then shifts to transfer learning concepts, such as model freezing, fine-tuning, pre-trained models including VGG, inception, ResNet, and how these systems perform better than DL models with practical examples. In the concluding chapters, we will focus on a multitude of real-world case studies and problems associated with areas such as computer vision, audio analysis and natural language processing (NLP). By the end of this book, you will be able to implement both DL and transfer learning principles in your own systems.
Table of Contents (14 chapters)

ML techniques

ML is a popular subfield of AI, one which covers a very wide scope. One of the reasons for this popularity is the comprehensive toolbox of sophisticated algorithms, techniques, and methodologies under its gambit. This toolbox has been developed and improved over the years, and new ones are being researched on an ongoing basis. To understand and use the ML toolbox wisely, consider the following few ways of categorizing it.

Categorization based on amount of human supervision:

  • Supervised learning: This class of learning involves high-human supervision. The algorithms under supervised learning utilize the training data and associated outputs to learn a mapping between the two and apply the same on unseen data. Classification and regression are two major types of supervised learning algorithms.
  • Unsupervised learning: This class of algorithms attempts to learn inherent latent structures, patterns, and relationships from the input data without any associated outputs/labels (human supervision). Clustering, dimensionality reduction, association rule mining, and so on are a few major types of unsupervised learning algorithms.
  • Semi-supervised learning: This class of algorithms is a hybrid of supervised and unsupervised learning. In this case, the algorithms work with small amounts of labeled training data and more of unlabeled data. Thus making a creative use of both supervised and unsupervised methods to solve a given task.
  • Reinforcement learning: This class of algorithms is a bit different from supervised and unsupervised learning methods. The central entity here is an agent, which trains over a period while interacting with its environment to maximize some reward/award. The agent iteratively learns and changes strategies/policies based on rewards/penalties from interacting with the environment.

Categorization based on data availability:

  • Batch learning: This is also termed as offline learning. This type of learning is utilized when the required training data is available, and a model can be trained and fine-tuned before deploying into production/real world.
  • Online learning: As the name suggests, in this case the learning is not stopped once the data is available. Rather, in this case, data is fed into the system in mini-batches and the training process continues with new batches of data.

The previously discussed categorizations give us an abstract view of how ML algorithms can be organized, understood, and utilized. The most common way to categorize them is into supervised and unsupervised learning algorithms. Let's go into a bit more detail about these two categories as this should help us get started for further advanced topics to be introduced later.

Supervised learning

Supervised learning algorithms are a class of algorithms that utilize data samples (also called training samples) and corresponding outputs (or labels) to infer a mapping function between the two. The inferred mapping function or the learned function is the output of this training process. The learned function is then utilized to correctly map new and unseen data points (input elements) to test the performance of the learned function.

Some key concepts for supervised learning algorithms are as follows:

  • Training dataset: The training samples and corresponding outputs utilized during the training process are termed as training data. Formally, a training dataset is a two-element tuple consisting of an input element (usually a vector) and a corresponding output element or signal.
  • Test dataset: The unseen dataset that is utilized to test the performance of the learned function. This dataset is also a two-element tuple containing input data points and corresponding output signals. Data points in this set are not used for the training phase (this dataset is further divided into the validation set as well; we will discuss this in more detail in subsequent chapters).
  • Learned function: This is the output of the training phase. Also termed as inferred function or the model. This function is inferred based on the training examples (input data points and their corresponding outputs) from the training dataset. An ideal model/learned function would learn the mapping in such a way that the results can be generalized for unseen data as well.

There are various supervised learning algorithms available. Based on the use case requirements, they can be majorly categorize into classification and regression models.


In the simplest terms, these algorithms help us answer objective questions or a yes-no prediction. For instance, these algorithms are useful in scenarios like is it going to rain today?, or can this tumour be cancerous?, and so on.

Formally, the key objective of classification algorithms is to predict output labels that are categorical in nature depending upon the input data points. The output labels are categorical in nature; namely, they each belong to a discrete class or category.

Logistic regression, Support Vector Machines (SVMs), Neural Networks, Random Forests, k-Nearest Neighbours (KNN), Decision Trees, and so on are some of the popular classification algorithms.

Suppose we have a real-world use case to evaluate different car models. To keep things simple, let's assume that the model is expected to predict an output for every car model as either acceptable or unacceptable based on multiple input training samples. The input training samples have attributes such as buying price, number of doors, capacity (in number of persons), and safety.

The level apart from the class label denotes each data point as either acceptable or unacceptable. The following diagram depicts the binary classification problem at hand. The classification algorithm takes the training samples as input to prepare a supervised model. This model is then utilized to predict the evaluation label for a new data point:

Supervised learning: Binary classification for car model evaluation

Since output labels are discrete classes in case of classification problems, if there are only two possible output classes the task is termed as a binary classification problem, and a multi-class classification otherwise. Predicting whether it will rain tomorrow or not would be a binary classification problem (with output being a yes or a no) while predicting a numeric digit from scanned handwritten images would be multi-class classification with 10 labels (zero to nine possible output labels).


This class of supervised learning algorithms helps us answer quantitative questions of the type how many or how much?. Formally, the key objective for regression models is value estimation. In this case, the output labels are continuous in nature (as opposed to being discrete in classification).

In the case of regression problems, the input data points are termed as independent or explanatory variables, while the output is termed as a dependent variable. Regression models are also trained using training data samples consisting of input (or independent) data points along with output (or dependent) signals. Linear regression, multivariate regression, regression trees, and so on are a few supervised regression algorithms.

Regression models can be further categorized based on how they model the relationship between dependent and independent variables.

Simple linear regression models work with single independent and single dependent variables. Ordinary Least Squares (OLS) regression is a popular linear regression model. Multiple regression or multivariate regression is where there is a single dependent variable, while each observation is a vector composed of multiple explanatory variables.

Polynomial regression models are a special case of multivariate regression. Here the dependent variable is modeled to the nth degree of the independent variable. Since polynomial regression models fit or map nonlinear relationships between dependent and independent variables, these are also termed as nonlinear regression models.

The following is an example of linear regression:

Supervised learning: Linear regression

To understand different regression types, let's consider a real-world use case of estimating the stopping distance of a car, based on its speed. Here, based on the training data we have, we can model the stopping distance as a linear function of speed or as a polynomial function of the speed of the car. Remember that the main objective is to minimize the error without overfitting the training data itself.

The preceding graph depicts a linear fit while the following one depicts a polynomial fit for the the same dataset:

Supervised learning: Polynomial regression

Unsupervised learning

As the name suggests, this class of algorithms learns/infers concepts without supervision. Unlike supervised learning algorithms, which infer a mapping function based on training dataset consisting of input data points and output signals, unsupervised algorithms are tasked with finding patterns and relationships in the training data without any output signals available in the training dataset. This class of algorithms utilizes the input dataset to detect patterns, and mine for rules or group/cluster data points so as to extract meaningful insights from the raw input dataset.

Unsupervised algorithms come in handy when we do not have the liberty of a training set that contains corresponding output signals or labels. In many real-world scenarios, datasets are available without output signals and it is difficult to manually label them. Thus, unsupervised algorithms are helpful in plugging such gaps.

Similar to supervised learning algorithms, unsupervised algorithms can also be categorized for ease of understanding and learning. The following are different categories of unsupervised learning algorithms.


The unsupervised equivalent of classification is termed as clustering. These algorithms help us cluster or group data points into different groups or categories, without the availability of any output label in the input/training dataset. These algorithms try to find patterns and relationships from the input dataset, utilizing inherent features to group them into various groups based on some similarity measure, as shown in the following diagram:

Unsupervised learning: Clustering news articles

A real-world example to help understand clustering could be news articles. There are hundreds of news articles written daily, each catering to different topics ranging from politics and sports to entertainment, and so on. An unsupervised approach to group these articles together can be achieved using clustering, as shown in the preceding figure.

There are different approaches to perform the process of clustering. The most popular ones are:

  • Centroid based methods. Popular ones are K-means and K-medoids.
  • Agglomerative and divisive hierarchical clustering methods. Popular ones are Ward's and affinity propagation.
  • Data distribution based methods, for instance, Gaussian mixture models.
  • Density based methods such as DBSCAN and so on.

Dimensionality reduction

Data and ML are the best of friends, yet a lot of issues come with more and bigger data. A large number of attributes or a bloated-up feature space is one common problem. A large feature space poses problems in analyzing and visualizing the data along with issues related to training, memory, and space constraints. This is also known as the curse of dimensionality. Since unsupervised methods help us extract insights and patterns from unlabeled training datasets, they are also useful in helping us reduce dimensionality.

In other words, unsupervised methods help us reduce feature space by helping us select a representative set of features from the complete available list:

Unsupervised learning: Dimensionality reduction using PCA

Principal Component Analysis (PCA), nearest neighbors, and discriminant analysis are some of the popular dimensionality reduction techniques.

The preceding diagram is a famous depiction of the workings of the PCA based dimensionality reduction technique. It shows a swiss roll shape with data represented in three-dimensional space. Application of PCA results in transformation of the data into two-dimensional space, as shown on the right-hand side of the diagram.

Association rule mining

This class of unsupervised ML algorithms helps us understand and extract patterns from transactional datasets. Also termed as Market Basket Analysis (MBA), these algorithms help us identify interesting relationships and associations between items across transactions.

Using association rule mining, we can answer questions like what items are bought together by people at a given store?, or do people who buy wine also tend to buy cheese?, and many more. FP-growth, ECLAT, and Apriori are some of the most widely used algorithms for association rule mining tasks.

Anomaly detection

Anomaly detection is the task of identifying rare events/observations based on historical data. Anomaly detection is also termed as outlier detection. Anomalies or outliers usually have characteristics such as being infrequent or occurring in short sudden bursts over time.

For such tasks, we provide a historical dataset for the algorithm so it can identify and learn the normal behavior of data in an unsupervised manner. Once learned, the algorithm helps us identify patterns that differ from this learned behavior.