Machine Learning Algorithms - Second Edition

Overview of this book

Machine learning has gained tremendous popularity for its powerful and fast predictions with large datasets. However, the true forces behind its powerful output are the complex algorithms involving substantial statistical analysis that churn large datasets and generate substantial insight. This second edition of Machine Learning Algorithms walks you through prominent development outcomes that have taken place relating to machine learning algorithms, which constitute major contributions to the machine learning process and help you to strengthen and master statistical interpretation across the areas of supervised, semi-supervised, and reinforcement learning. Once the core concepts of an algorithm have been covered, you’ll explore real-world examples based on the most diffused libraries, such as scikit-learn, NLTK, TensorFlow, and Keras. You will discover new topics such as principal component analysis (PCA), independent component analysis (ICA), Bayesian regression, discriminant analysis, advanced clustering, and gaussian mixture. By the end of this book, you will have studied machine learning algorithms and be able to put them into production to make your machine learning applications more innovative.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

A Gentle Introduction to Machine Learning

Introduction – classic and adaptive machines

Only learning matters

Beyond machine learning – deep learning and bio-inspired adaptive systems

Machine learning and big data

Summary

Important Elements in Machine Learning

Data formats

Learnability

Introduction to statistical learning concepts

Class balancing

Elements of information theory

Summary

Feature Selection and Feature Engineering

scikit-learn toy datasets

Creating training and test sets

Managing categorical data

Managing missing features

Data scaling and normalization

Feature selection and filtering

Principal Component Analysis

Independent Component Analysis

Atom extraction and dictionary learning

Visualizing high-dimensional datasets using t-SNE

Summary

Regression Algorithms

Linear models for regression

A bidimensional example

Linear regression with scikit-learn and higher dimensionality

Ridge, Lasso, and ElasticNet

Robust regression

Bayesian regression

Polynomial regression

Isotonic regression

Summary

Linear Classification Algorithms

Linear classification

Logistic regression

Implementation and optimizations

Stochastic gradient descent algorithms

Passive-aggressive algorithms

Finding the optimal hyperparameters through a grid search

Classification metrics

ROC curve

Summary

Naive Bayes and Discriminant Analysis

Bayes' theorem

Naive Bayes classifiers

Naive Bayes in scikit-learn

Discriminant analysis

Summary

Support Vector Machines

Linear SVM

SVMs with scikit-learn

Kernel-based classification

ν-Support Vector Machines

Support Vector Regression

Introducing semi-supervised Support Vector Machines (S3VM)

Summary

Decision Trees and Ensemble Learning

Binary Decision Trees

Decision Tree classification with scikit-learn

Decision Tree regression

Introduction to Ensemble Learning

Summary

Clustering Fundamentals

Clustering basics

k-NN

Gaussian mixture

K-means

Evaluation methods based on the ground truth

Summary

Advanced Clustering

DBSCAN

Spectral Clustering

Online Clustering

Biclustering

Summary

Hierarchical Clustering

Hierarchical strategies

Agglomerative Clustering

Summary

Introducing Recommendation Systems

Naive user-based systems

Content-based systems

Model-free (or memory-based) collaborative filtering

Model-based collaborative filtering

Summary

Introducing Natural Language Processing

NLTK and built-in corpora

The Bag-of-Words strategy

Part-of-Speech

A sample text classifier based on the Reuters corpus

Summary

Topic Modeling and Sentiment Analysis in NLP

Topic modeling

Introducing Word2vec with Gensim

Sentiment analysis

Summary

Introducing Neural Networks

Deep learning at a glance

MLPs with Keras

Summary

Advanced Deep Learning Models

Deep model layers

An example of a deep convolutional network with Keras

An example of an LSTM network with Keras

A brief introduction to TensorFlow

Summary

Creating a Machine Learning Architecture

Machine learning architectures

Scikit-learn tools for machine learning architectures

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introduction – classic and adaptive machines

Since time immemorial, human beings have built tools and machines to simplify their work and reduce the overall effort needed to complete many different tasks. Even without knowing any physical law, they invented levers (formally described for the first time by Archimedes), instruments, and more complex machines to carry out longer and more sophisticated procedures. Hammering a nail became easier and more painless thanks to a simple trick, and so did moving heavy stones or wood using a cart. But, what's the difference between these two examples? Even if the latter is still a simple machine, its complexity allows a person to carry out a composite task without thinking about each step. Some fundamental mechanical laws play a primary role in allowing a horizontal force to contrast gravity efficiently, but neither human beings, nor horses or oxen, knew anything about them. The primitive people simply observed how a genial trick (the wheel) could improve their lives.

The lesson we've learned is that a machine is never efficient or trendy without a concrete possibility to use it with pragmatism. A machine is immediately considered useful and destined to be continuously improved if its users can easily understand what tasks can be completed with less effort or automatically. In the latter case, some intelligence seems to appear next to cogs, wheels, or axles. So, a further step can be added to our evolution list: automatic machines, built (nowadays, we'd say programmed) to accomplish specific goals by transforming energy into work. Wind or watermills are some examples of elementary tools that are able to carry out complete tasks with minimal (compared to a direct activity) human control.

In the following diagram, there's a generic representation of a classical system that receives some input values, processes them, and produces output results:

Interaction diagram of a classic/non-adaptive system

But again, what's the key to the success of a mill? It's not hasty at all to say that human beings have tried to transfer some intelligence into their tools since the dawn of technology. Both the water in a river and the wind show a behavior that we can simply call flowing. They have a lot of energy to give us free of any charge, but a machine should have some awareness to facilitate this process. A wheel can turn around a fixed axle millions of times, but the wind must find a suitable surface to push on. The answer seems obvious, but you should try to think about people without any knowledge or experience; even if implicitly, they started a brand new approach to technology. If you prefer to reserve the word intelligence to more recent results, it's possible to say that the path started with tools, moved first to simple machines, and then moved to smarter ones.

Without further intermediate (but no less important) steps, we can jump into our epoch and change the scope of our discussion. Programmable computers are widespread, flexible, and more and more powerful instruments; moreover, the diffusion of the internet allowed us to share software applications and related information with minimal effort. The word-processing software that I'm using, my email client, a web browser, and many other common tools running on the same machine, are all examples of such flexibility. It's undeniable that the IT revolution dramatically changed our lives and sometimes improved our daily jobs, but without machine learning (and all its applications), there are still many tasks that seem far out of the computer domain. Spam filtering, Natural Language Processing (NLP), visual tracking with a webcam or a smartphone, and predictive analysis are only a few applications that revolutionized human-machine interaction and increased our expectations. In many cases, they transformed our electronic tools into actual cognitive extensions that are changing the way we interact with many daily situations. They achieved this goal by filling the gap between human perception, language, reasoning, and model and artificial instruments.

Here's a schematic representation of an adaptive system:

Interaction diagram of an adaptive system

Such a system isn't based on static or permanent structures (model parameters and architectures), but rather on a continuous ability to adapt its behavior to external signals (datasets or real-time inputs) and, like a human being, to predict the future using uncertain and fragmentary pieces of information.

Before moving on with a more specific discussion, let's briefly define the different kinds of system analysis that can be performed. These techniques are often structured as a sequence of specific operations whose goal is increasing the overall domain knowledge and allowing answering specific questions, however, in some cases, it's possible to limit the process to a single step in order to meet specific business needs. I always suggest to briefly consider them all, because many particular operations make sense only when some conditions are required. A clear understanding of the problem and its implications is the best way to make the right decisions, also taking into consideration possible future developments.

Descriptive analysis

Before trying any machine learning solution, it's necessary to create an abstract description of the context. The best way to achieve this goal is to define a mathematical model, which has the advantage of being immediately comprehensible by anybody (assuming the basic knowledge). However, the goal of descriptive analysis is to find out an accurate description of the phenomena that are observed and validate all the hypothesis. Let's suppose that our task is to optimize the supply chain of a large store. We start collecting data about purchases and sales and, after a discussion with a manager, we define the generic hypotheses that the sales volume increases during the day before the weekend. This means that our model should be based on a periodicity. A descriptive analysis has the task to validate it, but also to discover all those other particular features that were initially neglected.

At the end of this stage, we should know, for example, if the time series (let's suppose we consider only a variable) is periodic, if it has a trend, if it's possible to find out a set of standard rules, and so forth. A further step (that I prefer to consider as whole with this one) is to define a diagnostic model that must be able to connect all the effects with precise causes. This process seems to go in the opposite direction, but its goal is very close to the descriptive analysis one. In fact, whenever we describe a phenomenon, we are naturally driven to finding a rational reason that justifies each specific step. Let's suppose that, after having observed the periodicity in our time series, we find a sequence that doesn't obey this rule. The goal of diagnostic analysis is to give a suitable answer (that is, the store is open on Sunday). This new piece of information enriches our knowledge and specializes it: now, we can state that the series is periodic only when there is a day off, and therefore (clearly, this is a trivial example) we don't expect an increase in the sales before a working day. As many machine learning models have specific prerequisites, a descriptive analysis allows us to immediately understand whether a model will perform poorly or if it's the best choice considering all the known factors. In all of the examples we will look at, we are going to perform a brief descriptive analysis by defining the features of each dataset and what we can observe. As the goal of this book is to focus on adaptive systems, we don't have space for a complete description, but I always invite the reader to imagine new possible scenarios, performing a virtual analysis before defining the models.

Predictive analysis

The goal of machine learning is almost related to this precise stage. In fact, once we have defined a model of our system, we need to infer its future states, given some initial conditions. This process is based on the discovery of the rules that underlie the phenomenon so as to push them forward in time (in the case of a time series) and observe the results. Of course, the goal of a predictive model is to minimize the error between actual and predictive value, considering all possible interfering factors.

In the example of the large store, a good model should be able to forecast a peak before a day off and a normal behavior in all the other cases. Moreover, once a predictive model has been defined and trained, it can be used as a fundamental part of a decision-based process. In this case, the prediction must be turned into a suggested prescription. For example, the object detector of a self-driving car can be extremely accurate and detect an obstacle on time. However, which is the best action to perform in order to achieve a specific goal? According to the prediction (position, size, speed, and so on), another model must be able to pick the action that minimizes the risk of damage and maximizes the probability of a safe movement. This is a common task in reinforcement learning, but it's also extremely useful whenever a manager has to make a decision in a context where there are many factors. The resultant model is, hence, a pipeline that is fed with raw inputs and uses the single outcomes as inputs for subsequent models. Returning to our initial example, the store manager is not interested in discovering the hidden oscillations, but in the right volumes of goods that he has to order every day. Therefore, the first step is predictive analysis, while the second is a prescriptive one, which can take into account many factors that are discarded by the previous model (that is, different suppliers can have shorter or longer delivery times or they can apply discounts according to the volume).

So, the manager will probably define a goal in terms of a function to maximize (or minimize), and the model has to find the best amount of goods to order so as to fulfill the main requirement (that, of course, is the availability, and it depends on the sales prediction). In the remaining part of this book, we are going to discuss many solutions to specific problems, focusing on the predictive stage. But, in order to move on, we need to define what learning means and why it's so important in more and more different business contexts.

Machine Learning Algorithms - Second Edition

Machine Learning Algorithms - Second Edition

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning Algorithms - Second Edition

Hands-On Unsupervised Learning with Python

Mastering Machine Learning Algorithms

Mastering Machine Learning Algorithms.