Book Overview & Buying
Table Of Contents

Mastering Machine Learning with R

By : Cory Lesmeister

4.3 (6)

Buy this Book

Mastering Machine Learning with R

4.3 (6)

By: Cory Lesmeister

Buy this Book

Overview of this book

Machine learning is a field of Artificial Intelligence to build systems that learn from data. Given the growing prominence of R—a cross-platform, zero-cost statistical programming environment—there has never been a better time to start applying machine learning to your data. The book starts with introduction to Cross-Industry Standard Process for Data Mining. It takes you through Multivariate Regression in detail. Moving on, you will also address Classification and Regression trees. You will learn a couple of “Unsupervised techniques”. Finally, the book will walk you through text analysis and time series. The book will deliver practical and real-world solutions to problems and variety of tasks such as complex recommendation systems. By the end of this book, you will gain expertise in performing R machine learning and will be able to build complex ML projects using R and its packages.

Preface

Machine learning defined

Machine learning caveats

Failure to engineer features

Overfitting and underfitting

Causality

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Free Chapter

1. A Process for Success

The process

Business understanding

Data understanding

Data preparation

Modeling

Evaluation

Deployment

Algorithm flowchart

Summary

2. Linear Regression – The Blocking and Tackling of Machine Learning

Univariate linear regression

Multivariate linear regression

Other linear model considerations

Summary

3. Logistic Regression and Discriminant Analysis

Classification methods and linear regression

Logistic regression

Model selection

Summary

4. Advanced Feature Selection in Linear Models

Regularization in a nutshell

Business case

Modeling and evaluation

Model selection

Summary

5. More Classification Techniques – K-Nearest Neighbors and Support Vector Machines

K-Nearest Neighbors

Support Vector Machines

Business case

Feature selection for SVMs

Summary

6. Classification and Regression Trees

Introduction

An overview of the techniques

Business case

Summary

7. Neural Networks

Neural network

Deep learning, a not-so-deep overview

Business understanding

Data understanding and preparation

Modeling and evaluation

An example of deep learning

Summary

8. Cluster Analysis

Hierarchical clustering

K-means clustering

Gower and partitioning around medoids

Data understanding and preparation

Modeling and evaluation

Summary

9. Principal Components Analysis

An overview of the principal components

Modeling and evaluation

Summary

10. Market Basket Analysis and Recommendation Engines

An overview of a market basket analysis

Business understanding

Data understanding and preparation

Modeling and evaluation

An overview of a recommendation engine

Business understanding and recommendations

Data understanding, preparation, and recommendations

Modeling, evaluation, and recommendations

Summary

11. Time Series and Causality

Univariate time series analysis

Modeling and evaluation

Summary

12. Text Mining

Text mining framework and methods

Topic models

Modeling and evaluation

Summary

A. R Fundamentals

Introduction

Getting R up and running

Using R

Data frames and matrices

Summary stats

Installing and loading the R packages

Summary

Index

Model training and evaluation

As mentioned previously, we'll be predicting customer satisfaction. The data is based on a former online competition. I've taken the training portion of the data and cleaned it up for our use.

A full description of the contest and the data is available at the following link: https://www.kaggle.com/c/santander-customer-satisfaction/data.

This is an excellent dataset for a classification problem for many reasons. Like so much customer data, it's very messy— especially before I removed a bunch of useless features (there was something like four dozen zero variance features). As discussed in the prior two chapters, I addressed missing values, linear dependencies, and highly correlated pairs. I also found the feature names lengthy and useless, so I coded them V1 through V142. The resulting data deals with what's usually a difficult...

Tech Concepts

Programming languages

Tech Tools