Book Image

Ensemble Machine Learning Cookbook

By : Dipayan Sarkar, Vijayalakshmi Natarajan
Book Image

Ensemble Machine Learning Cookbook

By: Dipayan Sarkar, Vijayalakshmi Natarajan

Overview of this book

Ensemble modeling is an approach used to improve the performance of machine learning models. It combines two or more similar or dissimilar machine learning algorithms to deliver superior intellectual powers. This book will help you to implement popular machine learning algorithms to cover different paradigms of ensemble machine learning such as boosting, bagging, and stacking. The Ensemble Machine Learning Cookbook will start by getting you acquainted with the basics of ensemble techniques and exploratory data analysis. You'll then learn to implement tasks related to statistical and machine learning algorithms to understand the ensemble of multiple heterogeneous algorithms. It will also ensure that you don't miss out on key topics, such as like resampling methods. As you progress, you’ll get a better understanding of bagging, boosting, stacking, and working with the Random Forest algorithm using real-world examples. The book will highlight how these ensemble methods use multiple models to improve machine learning results, as compared to a single model. In the concluding chapters, you'll delve into advanced ensemble models using neural networks, natural language processing, and more. You’ll also be able to implement models such as fraud detection, text categorization, and sentiment analysis. By the end of this book, you'll be able to harness ensemble techniques and the working mechanisms of machine learning algorithms to build intelligent models using individual recipes.
Table of Contents (14 chapters)

What this book covers

Chapter 1, Get Closer to Your Data, explores a dataset and implements hands-on coding with Python for exploratory data analysis using statistical methods and visualization for the dataset.

Chapter 2, Getting Started with Ensemble Machine Learning, explores what ensemble learning is and how it can help in real-life scenarios. Basic ensemble techniques, including averaging, weighted averaging, and max-voting, are explained. These techniques form the basis for ensemble techniques, and an understanding of them will lay the groundwork for readers to move to more advanced stage after reading this chapter.

Chapter 3, Resampling Methods, introduces a handful of algorithms that will be useful when we get into an ensemble of multiple heterogeneous algorithms. This chapter uses scikit-learn to prepare all the algorithms to be used.

Chapter 4, Statistical and Machine Learning Algorithms, helps the readers to get to know various types of resampling methods that are used by machine-learning algorithms. Each resampling method has its advantages and disadvantages, which are explained to the readers. The readers also learn the code to be executed for each type of sampling.

Chapter 5, Bag the Models with Bagging, provides the readers with an understanding of what bootstrap aggregation is and how the bootstrap results can be aggregated, in a process also known as bagging.

Chapter 6, When in Doubt, Use Random Forests, introduces the random forest algorithm. It will introduce to readers how, and what kind of, ensemble techniques are used by Random Forest and how this helps our models avoid overfitting.

Chapter 7, Boosting Model Performance with Boosting, introduces boosting and discusses how it helps to improve a model performance by reducing variances and increasing accuracy. This chapter provides information such as the fact that boosting is not robust against outliers and noisy data but is flexible and can be used with a loss function.

Chapter 8, Blend It with Stacking, applies stacking to learn the optimal combination of base learners. This chapter will acquaint readers with stacking, which is also known as stacked generalization.

Chapter 9, Homogeneous Ensemble Using Keras, is a complete code walk-through on a classification case study for recognizing hand-written digits with homogeneous algorithms – in this case, multiple neural network models using Keras.

Chapter 10, Heterogeneous Ensemble Classifiers Using H2O, is a complete code walk-through on a classification case study for default prediction with an ensemble of multiple heterogeneous algorithms using scikit-learn.

Chapter 11, Heterogeneous Ensemble for Text Classification Using NLP, is a complete code walk-through on a classification case study to classify sentiment polarity using an ensemble of multiple heterogeneous algorithms. Here, NLP techniques such as semantics are used to improve the accuracy of classification. Then, the mined text information is used to employ ensemble classification techniques for sentiment analysis. In this case study, the H2O library is used for building models.

Chapter 12, Homogeneous Ensemble for Multiclass Classification Using Keras, is a complete code walk-through on a classification case study for multiple classification with homogeneous ensemble using data diversity with the tf.keras module from TensorFlow.