Advanced Machine Learning with R

Book Image

Advanced Machine Learning with R

By : Cory Lesmeister, Dr. Sunil Kumar Chinnamgari

Book Image

Advanced Machine Learning with R

By: Cory Lesmeister, Dr. Sunil Kumar Chinnamgari

Overview of this book

R is one of the most popular languages when it comes to exploring the mathematical side of machine learning and easily performing computational statistics. This Learning Path shows you how to leverage the R ecosystem to build efficient machine learning applications that carry out intelligent tasks within your organization. You’ll work through realistic projects such as building powerful machine learning models with ensembles to predict employee attrition. Next, you’ll explore different clustering techniques to segment customers using wholesale data and even apply TensorFlow and Keras-R for performing advanced computations. Each chapter will help you implement advanced machine learning algorithms using real-world examples. You’ll also be introduced to reinforcement learning along with its use cases and models. Finally, this Learning Path will provide you with a glimpse into how some of these black box models can be diagnosed and understood. By the end of this Learning Path, you’ll be equipped with the skills you need to deploy machine learning techniques in your own projects.

Title Page

Copyright and Credits

Copyright and Credits

About Packt

Contributors

Preface

Free Chapter

Preparing and Understanding Data

Preparing and Understanding Data

Reading the data

Handling duplicate observations

Handling missing values

Zero and near-zero variance features

Treating the data

Linear Regression

Linear Regression

Univariate linear regression

Multivariate linear regression

Logistic Regression

Logistic Regression

Classification methods and linear regression

Logistic regression

Model training and evaluation

Advanced Feature Selection in Linear Models

Advanced Feature Selection in Linear Models

Regularization overview

Modeling and evaluation

K-Nearest Neighbors and Support Vector Machines

K-Nearest Neighbors and Support Vector Machines

K-nearest neighbors

Support vector machines

Manipulating data

Modeling and evaluation

Tree-Based Classification

Tree-Based Classification

An overview of the techniques

Datasets and modeling

Neural Networks and Deep Learning

Neural Networks and Deep Learning

Introduction to neural networks

Deep learning – a not-so-deep overview

Creating a simple neural network

An example of deep learning

Creating Ensembles and Multiclass Methods

Creating Ensembles and Multiclass Methods

Data understanding

Modeling and evaluation

Cluster Analysis

Cluster Analysis

Hierarchical clustering

K-means clustering

Dataset background

Data understanding and preparation

Principal Component Analysis

Principal Component Analysis

An overview of the principal components

Association Analysis

Association Analysis

An overview of association analysis

Data understanding

Data preparation

Modeling and evaluation

Time Series and Causality

Time Series and Causality

Univariate time series analysis

Time series data

Modeling and evaluation

Text Mining

Text mining framework and methods

Sentiment analysis

Classifying text

Additional quantitative analysis

Exploring the Machine Learning Landscape

Exploring the Machine Learning Landscape

ML versus software engineering

Types of ML methods

ML terminology – a quick review

ML project pipeline

Learning paradigm

Predicting Employee Attrition Using Ensemble Models

Predicting Employee Attrition Using Ensemble Models

Philosophy behind ensembling

Getting started

Understanding the attrition problem and the dataset

K-nearest neighbors model for benchmarking the performance

Randomization with random forests

Implementing a Jokes Recommendation Engine

Implementing a Jokes Recommendation Engine

Fundamental aspects of recommendation engines

Getting started

Understanding the Jokes recommendation problem and the dataset

Building a recommendation system with an item-based collaborative filtering technique

Building a recommendation system with a user-based collaborative filtering technique

Building a recommendation system based on an association-rule mining technique

Content-based recommendation engine

Building a hybrid recommendation system for Jokes recommendations

Sentiment Analysis of Amazon Reviews with NLP

Sentiment Analysis of Amazon Reviews with NLP

The sentiment analysis problem

Getting started

Understanding the Amazon reviews dataset

Building a text sentiment classifier with the BoW approach

Understanding word embedding

Building a text sentiment classifier with pretrained word2vec word embedding based on Reuters news corpus

Building a text sentiment classifier with GloVe word embedding

Building a text sentiment classifier with fastText

Customer Segmentation Using Wholesale Data

Customer Segmentation Using Wholesale Data

Understanding customer segmentation

Understanding the wholesale customer dataset and the segmentation problem

Identifying the customer segments in wholesale customer data using k-means clustering

Identifying the customer segments in the wholesale customer data using DIANA

Identifying the customer segments in the wholesale customers data using AGNES

Image Recognition Using Deep Neural Networks

Image Recognition Using Deep Neural Networks

Technical requirements

Understanding computer vision

Achieving computer vision with deep learning

Introduction to the MXNet framework

Understanding the MNIST dataset

Implementing a deep learning network for handwritten digit recognition

Implementing computer vision with pretrained models

Credit Card Fraud Detection Using Autoencoders

Credit Card Fraud Detection Using Autoencoders

Machine learning in credit card fraud detection

Autoencoders explained

The credit card fraud dataset

Building AEs with the H2O library in R

Automatic Prose Generation with Recurrent Neural Networks

Automatic Prose Generation with Recurrent Neural Networks

Understanding language models

Exploring recurrent neural networks

Backpropagation through time

Problems and solutions to gradients in RNN

Building an automated prose generator with an RNN

Winning the Casino Slot Machines with Reinforcement Learning

Winning the Casino Slot Machines with Reinforcement Learning

Understanding RL

Multi-arm bandit – real-world use cases

Solving the MABP with UCB and Thompson sampling algorithms

Creating a Package

Creating a Package

Creating a new package

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Chapter 1. Preparing and Understanding Data

"We've got to use every piece of data and piece of information, and hopefully that will help us be accurate with our player evaluation. For us, that's our lifeblood."

– Billy Beane, General Manager Oakland Athletics, subject of the book Moneyball

Research consistently shows that machine learning and data science practitioners spend most of their time manipulating data and preparing it for analysis. Indeed, many find it the most tedious and least enjoyable part of their work. Numerous companies are offering solutions to the problem but, in my opinion, results at this point are varied. Therefore, in this first chapter, I shall endeavor to provide a way of tackling the problem that will ease the burden of getting your data ready for machine learning. The methodology introduced in this chapter will serve as the foundation for data preparation and for understanding many of the subsequent chapters. I propose that once you become comfortable with this tried and true process, it may very well become your favorite part of machine learning—as it is for me.

The following are the topics that we'll cover in this chapter:

Overview
Reading the data
Handling duplicate observations
Descriptive statistics
Exploring categorical variables
Handling missing values
Zero and near-zero variance features
Treating the data
Correlation and linearity