Advanced Machine Learning with R

Advanced Machine Learning with R

By : Cory Lesmeister, Dr. Sunil Kumar Chinnamgari

Buy this Book

Advanced Machine Learning with R

By: Cory Lesmeister, Dr. Sunil Kumar Chinnamgari

Buy this Book

Overview of this book

R is one of the most popular languages when it comes to exploring the mathematical side of machine learning and easily performing computational statistics. This Learning Path shows you how to leverage the R ecosystem to build efficient machine learning applications that carry out intelligent tasks within your organization. You’ll work through realistic projects such as building powerful machine learning models with ensembles to predict employee attrition. Next, you’ll explore different clustering techniques to segment customers using wholesale data and even apply TensorFlow and Keras-R for performing advanced computations. Each chapter will help you implement advanced machine learning algorithms using real-world examples. You’ll also be introduced to reinforcement learning along with its use cases and models. Finally, this Learning Path will provide you with a glimpse into how some of these black box models can be diagnosed and understood. By the end of this Learning Path, you’ll be equipped with the skills you need to deploy machine learning techniques in your own projects.

Title Page

About Packt

Contributors

Preface

Free Chapter

Preparing and Understanding Data

Overview

Reading the data

Handling duplicate observations

Handling missing values

Zero and near-zero variance features

Treating the data

Summary

Linear Regression

Univariate linear regression

Multivariate linear regression

Summary

Logistic Regression

Classification methods and linear regression

Logistic regression

Model training and evaluation

Summary

Advanced Feature Selection in Linear Models

Regularization overview

Data creation

Modeling and evaluation

Summary

K-Nearest Neighbors and Support Vector Machines

K-nearest neighbors

Support vector machines

Manipulating data

Modeling and evaluation

Summary

Tree-Based Classification

An overview of the techniques

Datasets and modeling

Summary

Neural Networks and Deep Learning

Introduction to neural networks

Deep learning – a not-so-deep overview

Creating a simple neural network

An example of deep learning

Summary

Creating Ensembles and Multiclass Methods

Ensembles

Data understanding

Modeling and evaluation

Summary

Cluster Analysis

Hierarchical clustering

Data understanding and preparation

Modeling

Summary

Principal Component Analysis

An overview of the principal components

Data

PCA modeling

Summary

Association Analysis

An overview of association analysis

Data understanding

Data preparation

Modeling and evaluation

Summary

Time Series and Causality

Univariate time series analysis

Time series data

Modeling and evaluation

Summary

Text Mining

Text mining framework and methods

N-grams

Additional quantitative analysis

Summary

Exploring the Machine Learning Landscape

ML versus software engineering

Types of ML methods

ML terminology – a quick review

ML project pipeline

Learning paradigm

Datasets

Summary

Predicting Employee Attrition Using Ensemble Models

Philosophy behind ensembling

Getting started

Understanding the attrition problem and the dataset

K-nearest neighbors model for benchmarking the performance

Bagging

Randomization with random forests

Boosting

Stacking

Summary

Implementing a Jokes Recommendation Engine

Fundamental aspects of recommendation engines

Getting started

Understanding the Jokes recommendation problem and the dataset

Building a recommendation system with an item-based collaborative filtering technique

Building a recommendation system with a user-based collaborative filtering technique

Building a recommendation system based on an association-rule mining technique

Content-based recommendation engine

Building a hybrid recommendation system for Jokes recommendations

Summary

References

Sentiment Analysis of Amazon Reviews with NLP

The sentiment analysis problem

Getting started

Understanding the Amazon reviews dataset

Building a text sentiment classifier with the BoW approach

Understanding word embedding

Building a text sentiment classifier with pretrained word2vec word embedding based on Reuters news corpus

Building a text sentiment classifier with GloVe word embedding

Building a text sentiment classifier with fastText

Summary

Customer Segmentation Using Wholesale Data

Understanding customer segmentation

Understanding the wholesale customer dataset and the segmentation problem

Identifying the customer segments in wholesale customer data using k-means clustering

Identifying the customer segments in the wholesale customer data using DIANA

Identifying the customer segments in the wholesale customers data using AGNES

Summary

Image Recognition Using Deep Neural Networks

Technical requirements

Understanding computer vision

Achieving computer vision with deep learning

Introduction to the MXNet framework

Understanding the MNIST dataset

Implementing a deep learning network for handwritten digit recognition

Implementing computer vision with pretrained models

Summary

Credit Card Fraud Detection Using Autoencoders

Machine learning in credit card fraud detection

Autoencoders explained

The credit card fraud dataset

Building AEs with the H2O library in R

Summary

Automatic Prose Generation with Recurrent Neural Networks

Understanding language models

Exploring recurrent neural networks

Backpropagation through time

Problems and solutions to gradients in RNN

Building an automated prose generator with an RNN

Summary

Winning the Casino Slot Machines with Reinforcement Learning

Understanding RL

Multi-arm bandit – real-world use cases

Solving the MABP with UCB and Thompson sampling algorithms

Summary

Creating a Package

Creating a new package

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Preface

R is one of the most popular languages when it comes to exploring the mathematical side of machine learning and easily performing computational statistics.

This Learning Path shows you how to leverage the R ecosystem to build efficient machine learning applications that carry out intelligent tasks within your organization. You'll tackle realistic projects such as building powerful machine learning models with ensembles to predict employee attrition. You'll explore different clustering techniques to segment customers using wholesale data and use TensorFlow and Keras-R for performing advanced computations. Each chapter will help you implement advanced machine learning algorithms using real-world examples. You’ll also be introduced to reinforcement learning along with its various use cases and models. Additionally, this book provides you with a glimpse into how some of these black-box models can be diagnosed and understood.

By the end of this Learning Path, you’ll be equipped with the skills you need to deploy machine learning techniques in your own projects.

Who this book is for

If you’re a data analyst, data scientist, or machine learning developer who wants to master machine learning techniques using R, this is an ideal Learning Path for you. Each project will help you test your skills in implementing machine learning algorithms and techniques. A basic understanding of machine learning and working knowledge of R programming is necessary to get the most out of this Learning Path.

What this book covers

Chapter 1, Preparing and Understanding Data, covers the loading of data and demonstrates how to obtain an understanding of its structure and dimensions, as well as how to install the necessary packages.

Chapter 2, Linear Regression, provides you with a solid foundation before learning advanced methods such as Support Vector Machines and Gradient Boosting. No more solid foundation exists than the least squares linear regression.

Chapter 3, Logistic Regression, presents a discussion on how logistic regression and discriminant analysis is used in order to predict a categorical outcome. Multivariate adaptive regression splines have been added. This technique performs well, handles non-linearity, and is easy to explain.

Chapter 4, Advanced Feature Selection in Linear Models, shows regularization techniques to help improve the predictive ability and interpretability as feature selection is a critical and often extremely challenging component of machine learning. It also includes techniques not only for regression but also for a classification problem.

Chapter 5, K-Nearest Neighbors and Support Vector Machines, begins the exploration of the more advanced and nonlinear techniques. The real power of machine learning will be unveiled.

Chapter 6, Tree-Based Classification, offers some of the most powerful predictive abilities of all the machine learning techniques, especially for classification problems. Single decision trees will be discussed along with the more advanced random forests and boosted trees. It also contains very popular techniques provided by the XGBOOST package.

Chapter 7, Neural Networks and Deep Learning, shows some of the most exciting machine learning methods currently used. Inspired by how the brain works, neural networks and their more recent and advanced offshoot, Deep Learning, will be put to the test. It also includes code for the H2O package, including hyperparameter search.

Chapter 8, Creating Ensembles and Multiclass Methods, has completely new content, involving the utilization of several great packages.

Chapter 9, Cluster Analysis, covers unsupervised learning. Instead of trying to make a prediction, the goal will focus on uncovering the latent structure of observations. Three clustering methods will be discussed: hierarchical, k-means, and partitioning around medoids. It also includes the methodology for executing unsupervised learning with random forests.

Chapter 10, Principal Component Analysis, continues the examination of unsupervised learning with principal components analysis, which is used to uncover the latent structure of the features. Once this is done, the new features will be used in a supervised learning exercise.

Chapter 11, Association Analysis, explains association analysis and applies not only to making recommendations, product placement, and promotional pricing, but can also be used in manufacturing, web usage, and healthcare.

Chapter 12, Time Series and Causality, discusses univariate forecast models, bivariate regression, and Granger causality models, including an analysis of carbon emissions and climate change, along with a demonstration of different causality test methods.

Chapter 13, Text Mining, demonstrates a framework for quantitative text mining and the building of topic models. Along with time series, the world of data contains vast volumes of data in a textual format. With so much data as text, it is critically important to understand how to manipulate, code, and analyze the data in order to provide meaningful insights.

Chapter 14, Exploring the Machine Learning Landscape, will briefly review the various ML concepts that a practitioner must know. In this chapter, we will cover topics such as supervised learning, reinforcement learning, unsupervised learning, and real-world ML uses cases.

Chapter 15, Predicting Employee Attrition Using Ensemble Models, covers the creation of powerful ML models through ensemble learning. We will introduce the problem at hand and then attempt to explore the dataset with exploratory data analysis (EDA). Then in the preprocessing phase, we will create new features using prior domain experience. Once the dataset is fully prepared, models will be created using multiple ensemble techniques, such as bagging, boosting, stacking, and randomization. Lastly, we will deploy the finally selected model for production.

Chapter 16, Implementing a Joke Recommendation Engine, introduces recommendation engines. We start by understanding the concepts and types of collaborative filtering algorithms. We will then build a recommendation engine to provide personalized joke recommendations using collaborative filtering approaches such as user-based collaborative filters and item-based collaborative filters. Apart from this, we will be exploring various libraries available in R that can be used to build recommendation systems.

Chapter 17, Sentiment Analysis of Amazon Reviews with NLP, covers sentiment analysis, which entails finding the sentiment of a sentence and labeling it as positive, negative, or neutral and covers the various techniques that can be used to analyze text. We will understand text-mining concepts and the various ways that text is labeled based on the tone. Apart from using various popular R text-mining libraries to preprocess the reviews to be classified, we will also be leveraging a wide range of text representations, such as a bag of words, word2vec, fastText, and Glove.

Chapter 18, Customer Segmentation Using Wholesale Data, covers the segmentation, grouping, or clustering of customers, which can be achieved through unsupervised learning. In this chapter, we learn the various techniques of customer segmentation.We will be applying advanced clustering techniques, such as k-means, DIANA, and AGNES. We will explore the ML techniques for dealing with such ambiguity and have ML find out the number of groups possible based on the underlying characteristics of the input data. Evaluating the output of the clustering algorithms is an area that is often challenging to practitioners.

Chapter 19, Image Recognition Using Deep Neural Networks, covers convolutional neural networks (CNNs). We explore why CNNs work so well with computer vision problems such as object detection. We will learn about all of these concepts by applying a CNN in the building of a multi-class classification model on a popular open dataset called MNIST. We will learn about the various preprocessing techniques that can be applied to the image data in order to use the data with deep learning models.

Chapter 20, Credit Card Fraud Detection Using Autoencoders, covers autoencoders and how they are different from the other deep learning networks, such as recurrent neural networks (RNNs)and CNNs. We will learn about autoencoders by implementing a project that identifies credit card fraud. We will become familiar with dimensionality reduction and how it can be used to identify credit card fraud detection.

Chapter 21, Automatic Prose Generation with Recurrent Neural Networks, introduces some deep neural networks (DNNs). We will implement a neural network from scratch and will learn how to apply an RNN by doing a project. We will create an application based on long short-term memory (LSTM) network, a variant of RNNs that generates text automatically. To accomplish this task, we make use of the MXNet framework, which extends its support for the R language to perform deep learning.

Chapter 22, Winning the Casino Slot Machines with Reinforcement Learning, begins with an explanation of RL. We discuss the various concepts of RL, including strategies for solving what is called as the multi-arm bandit problem. We implement a project that uses UCB and Thompson sampling techniques in order to solve the multi-arm bandit problem.

Appendix, Creating a Package, includes additional data packages.

To get the most out of this book

Assuming the reader has a working knowledge of R and of basic statistics, this book will provide the skills and tools required to get the reader up and running with R and ML as quickly and painlessly as possible. There will probably always be detractors who complain that it does not offer enough math or does not do this, or that, or the other thing, but my answer to that is that these books already exist! Why duplicate what has already been done, and very well, for that matter? Again, I have sought to provide something different, something to hold the reader's attention and allow them to succeed in this competitive and rapidly changing field.

The projects covered in this book are intended to expose you to practical knowledge on the implementation of various ML techniques to real-world problems. It is expected that you have a good working knowledge of R and some basic understanding of ML. Basic knowledge of ML and R is a must prior to starting this project.

It should also be noted that the code for the projects is implemented using R version 3.5.2 (2018-12-20), nicknamed Eggshell Igloo. The project code has been successfully tested on Linux Mint 18.3 Sylvia. There is no reason to believe that the code does not work on other platforms, such as Windows; however, this is not something that has been tested by the author.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at www.packt.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Advanced-Machine-Learning-with-R. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

html, body, #map {
 height: 100%; 
 margin: 0;
 padding: 0
}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Advanced Machine Learning with R

By : Cory Lesmeister, Dr. Sunil Kumar Chinnamgari

Advanced Machine Learning with R

By: Cory Lesmeister, Dr. Sunil Kumar Chinnamgari

Overview of this book

Related Content you might be interested in

Current Title:

Advanced Machine Learning with R

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Note

Note

Get in touch

Reviews