Practical Machine Learning

Practical Machine Learning

By : Sunila Gollapudi

Buy this Book

Practical Machine Learning

By: Sunila Gollapudi

Buy this Book

Overview of this book

This book explores an extensive range of machine learning techniques uncovering hidden tricks and tips for several types of data using practical and real-world examples. While machine learning can be highly theoretical, this book offers a refreshing hands-on approach without losing sight of the underlying principles. Inside, a full exploration of the various algorithms gives you high-quality guidance so you can begin to see just how effective machine learning is at tackling contemporary challenges of big data This is the only book you need to implement a whole suite of open source tools, frameworks, and languages in machine learning. We will cover the leading data science languages, Python and R, and the underrated but powerful Julia, as well as a range of other big data platforms including Spark, Hadoop, and Mahout. Practical Machine Learning is an essential resource for the modern data scientists who want to get to grips with its real-world application. With this book, you will not only learn the fundamentals of machine learning but dive deep into the complexities of real world data before moving on to using Hadoop and its wider ecosystem of tools to process and manage your structured and unstructured data. You will explore different machine learning techniques for both supervised and unsupervised learning; from decision trees to Naïve Bayes classifiers and linear and clustering methods, you will learn strategies for a truly advanced approach to the statistical analysis of data. The book also explores the cutting-edge advancements in machine learning, with worked examples and guidance on deep learning and reinforcement learning, providing you with practical demonstrations and samples that help take the theory–and mystery–out of even the most advanced machine learning methodologies.

Practical Machine Learning

Credits

Foreword

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Introduction to Machine learning

Machine learning

Performance measures

Some complementing fields of Machine learning

Machine learning process lifecycle and solution architecture

Machine learning algorithms

Machine learning tools and frameworks

Summary

Machine learning and Large-scale datasets

Big data and the context of large-scale Machine learning

Algorithms and Concurrency

Technology and implementation options for scaling-up Machine learning

Summary

An Introduction to Hadoop's Architecture and Ecosystem

Introduction to Apache Hadoop

Machine learning solution architecture for big data (employing Hadoop)

Hadoop 2.x

Summary

Machine Learning Tools, Libraries, and Frameworks

Machine learning tools – A landscape

Apache Mahout

Julia

Python

Apache Spark

Spring XD

Summary

Decision Tree based learning

Decision trees

Implementing Decision trees

Summary

Instance and Kernel Methods Based Learning

Instance-based learning (IBL)

Kernel methods-based learning

Summary

Association Rules based learning

Association rules based learning

Implementing Apriori and FP-growth

Summary

Clustering based learning

Clustering-based learning

Types of clustering

The k-means clustering algorithm

Implementing k-means clustering

Summary

Bayesian learning

Implementing Naïve Bayes algorithm

Summary

Regression based learning

Regression analysis

Regression methods

Implementing linear and logistic regression

Summary

Deep learning

Background

Deep learning taxonomy

Implementing ANNs and Deep learning methods

Summary

Reinforcement learning

Reinforcement Learning (RL)

Reinforcement learning solution methods

Summary

Ensemble learning

Ensemble learning methods

Implementing ensemble methods

Summary

New generation data architectures for Machine learning

Evolution of data architectures

Emerging perspectives & drivers for new age data architectures

Modern data architectures for Machine learning

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Machine learning algorithms

Now, let's look at the important machine learning algorithms and some brief details about each of them. In-depth implementation aspects for each of the algorithms will be covered in later chapters. These algorithms are either classified under the problem type or the learning type. There is a simple classification of the algorithms given but it is intuitive and not necessarily exhaustive.

There are many ways of classifying or grouping machine learning algorithms, and in this book we will use the learning model based grouping. In each chapter, starting from Chapter 5, Decision Tree based learning, we will cover one or more learning models and associated algorithms. The following concept model depicts a listing of learning models:

Decision tree based algorithms

Decision tree based algorithms define models that are iteratively or recursively constructed based on the data provided. The goal of Decision tree based algorithms is to predict the value of a target variable given a set of input variables. Decision trees help solve classification and regression problems using tree based methods. Decisions fork in tree structures until a prediction decision is made for a given record. Some of the algorithms are as follows:

Random forest
Classification and Regression Tree (CART)
C4.5 and C5.0
Chi-square
Gradient boosting machines (GBM)
Chi-Squared Automatic Interaction Detection (CHAID)
Decision stump
Multivariate adaptive regression splines (MARS)

Bayesian method based algorithms

Bayesian methods are those that explicitly apply the Bayesian inference theorem and again solve classification and regression problems. Bayesian methods facilitate subjective probability in modeling. The following are some of the Bayesian based algorithms:

Naïve Bayes
Averaged one-dependence estimators (AODE)
Bayesian belief network (BBN)

Kernel method based algorithms

When we hear about kernel methods, the first thing that comes to mind is Support Vector Machines (SVM). These methods are usually a group of methods in themselves. kernel methods are concerned with pattern analysis and as explained in the preceding sections, that crux of pattern analysis includes various mapping techniques. Here, the mapping datasets include vector spaces. Some examples of kernel method based learning algorithms are listed as follows:

SVM
Linear discriminant analysis (LDA)

Clustering methods

Clustering, like regression, describes a class of problems and a class of methods. Clustering methods are typically organized by the modeling approaches such as centroid-based and hierarchical. These methods organize data into groups by assessing the similarity in the structure of input data:

K-means
Expectation maximization (EM) and Gaussian mixture models (GMM)

Artificial neural networks (ANN)

Similar to kernel methods, artificial neural networks are again a class of pattern matching techniques, but these models are inspired by the structure of biological neural networks. These methods are again used to solve classifications and regression problems. They relate to Deep learning modeling and have many subfields of algorithms that help solve specific problems in context.

Some of the methods in this category include:

Learning vector quantization (LVQ)
Self-organizing maps (SOM)
Hopfield network
Perceptron
Backpropagation

Dimensionality reduction

Like clustering methods, dimensionality reduction methods work iteratively and on the data structure in an unsupervised manner. Given the dataset and the dimensions, more dimensions would mean more work in the Machine learning implementation. The idea is to iteratively reduce the dimensions and bring more relevant dimensions forward. This technique is usually used to simplify high-dimensional data and then apply a supervised learning technique. Some example dimensionality reduction methods are listed as follows:

Multidimensional scaling (MDS)
Principal component analysis (PCA)
Projection pursuit (PP)
Partial least squares (PLS) regression
Sammon mapping

Ensemble methods

As the name suggests, ensemble methods encompass multiple models that are built independently and the results of these models are combined and responsible for overall predictions. It is critical to identify what independent models are to be combined or included, how the results need to be combined, and in what way to achieve the required result. The subset of models that are combined is sometimes referred to as weaker models as the results of these models need not completely fulfill the expected outcome in isolation. This is a very powerful and widely adopted class of techniques. The following are some of the Ensemble method algorithms:

Random forest
Bagging
AdaBoost
Bootstrapped Aggregation (Boosting)
Stacked generalization (blending)
Gradient boosting machines (GBM)

Instance based learning algorithms

Instances are nothing but subsets of datasets, and instance based learning models work on an identified instance or groups of instances that are critical to the problem. The results across instances are compared, which can include an instance of new data as well. This comparison uses a particular similarity measure to find the best match and predict. Instance based methods are also called case-based or memory-based learning. Here the focus is on the representation of the instances and similarity measures for comparison between instances. Some of the instance based learning algorithms are listed as follows:

k-Nearest Neighbour (k-NN)
Self-Organizing
Learning vector quantization (LVQ)
Self-organizing maps (SOM)

Regression analysis based algorithms

Regression is a process of refining the model iteratively based on the error generated by the model. Regression also is used to define a machine learning problem type. Some example algorithms in regression are:

Ordinary least squares linear regression
Logistic regression
Multivariate adaptive regression splines (MARS)
Stepwise regression

Association rule based learning algorithms

Given the variables, association rule based learning algorithms extract and define rules that can be applied on a dataset and demonstrate experienced-based learning, and thus prediction. These rules when associated in a multi-dimensional data context can be useful in a commercial context as well. Some of the examples of Association rule based algorithms are given as follows:

The Apriori algorithm
The Eclat algorithm

Practical Machine Learning

By : Sunila Gollapudi

Practical Machine Learning

By: Sunila Gollapudi

Overview of this book

Related Content you might be interested in

Current Title:

Practical Machine Learning

Machine learning algorithms

Decision tree based algorithms

Bayesian method based algorithms

Kernel method based algorithms

Clustering methods

Artificial neural networks (ANN)

Dimensionality reduction

Ensemble methods

Instance based learning algorithms

Regression analysis based algorithms

Association rule based learning algorithms