Learning Predictive Analytics with R

Learning Predictive Analytics with R

By : Eric Mayor

Buy this Book

Learning Predictive Analytics with R

By: Eric Mayor

Buy this Book

Overview of this book

This book is packed with easy-to-follow guidelines that explain the workings of the many key data mining tools of R, which are used to discover knowledge from your data. You will learn how to perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. All chapters will guide you in acquiring the skills in a practical way. Most chapters also include a theoretical introduction that will sharpen your understanding of the subject matter and invite you to go further. The book familiarizes you with the most common data mining tools of R, such as k-means, hierarchical regression, linear regression, association rules, principal component analysis, multilevel modeling, k-NN, Naïve Bayes, decision trees, and text mining. It also provides a description of visualization techniques using the basic visualization tools of R as well as lattice for visualizing patterns in data organized in groups. This book is invaluable for anyone fascinated by the data mining opportunities offered by GNU R and its packages.

Learning Predictive Analytics with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Setting GNU R for Predictive Analytics

Installing GNU R

The R graphic user interface

The menu bar of the R console

Packages

Summary

Visualizing and Manipulating Data Using R

The roulette case

Histograms and bar plots

Scatterplots

Boxplots

Line plots

Application – Outlier detection

Formatting plots

Summary

Data Visualization with Lattice

Loading and discovering the lattice package

Discovering multipanel conditioning with xyplot()

Discovering other lattice plots

Updating graphics

Case study – exploring cancer-related deaths in the US

Summary

Cluster Analysis

Distance measures

Learning by doing – partition clustering with kmeans()

Using k-means with public datasets

Summary

Agglomerative Clustering Using hclust()

The inner working of agglomerative clustering

Agglomerative clustering with hclust()

Summary

Dimensionality Reduction with Principal Component Analysis

The inner working of Principal Component Analysis

Learning PCA in R

Summary

Exploring Association Rules with Apriori

Apriori – basic concepts

The inner working of apriori

Analyzing data with apriori in R

Summary

Probability Distributions, Covariance, and Correlation

Probability distributions

Covariance and correlation

Summary

Linear Regression

Understanding simple regression

Working with multiple regression

Analyzing data in R: correlation and regression

Robust regression

Bootstrapping

Summary

Classification with k-Nearest Neighbors and Naïve Bayes

Understanding k-NN

Working with k-NN in R

Understanding Naïve Bayes

Working with Naïve Bayes in R

Computing the performance of classification

Summary

Classification Trees

Understanding decision trees

ID3

C4.5

C5.0

Classification and regression trees and random forest

Conditional inference trees and forests

Installing the packages containing the required functions

Performing the analyses in R

Caret – a unified framework for classification

Summary

Multilevel Analyses

Nested data

Multilevel regression

Multilevel modeling in R

Predictions using multilevel models

Summary

Text Analytics with R

An introduction to text analytics

Loading the corpus

Data preparation

Creating the training and testing data frames

Classification of the reviews

Mining the news with R

Summary

Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML

Cross-validation and bootstrapping of predictive models using the caret package

Exporting models using PMML

Summary

Exercises and Solutions

Exercises

Solutions

Classification of the reviews

At the beginning of this section, we will try to classify the corpus using algorithms we have already discussed (Naïve Bayes and k-NN). We will then briefly discuss two new algorithms: logistic regression and support vector machines.

Document classification with k-NN

We know k-Nearest Neighbors, so we'll just jump into the classification. We will try with three neighbors and five neighbors:

1  library(class) # knn() is in the class packages
2  library(caret) # confusionMatrix is in the caret package
3  set.seed(975)
4  Class3n = knn(TrainDF[,-1], TrainDF[,-1], TrainDF[,1], k = 3)
5  Class5n = knn(TrainDF[,-1], TrainDF[,-1], TrainDF[,1], k = 5)
6  confusionMatrix(Class3n,as.factor(TrainDF$quality))

The confusion matrix and the following statistics (the output has been partially reproduced) show that classification with three neighbors doesn't seem too bad: the accuracy is 0.74; yet, the kappa value is not good (it should be at least 0.60):

Confusion Matrix and Statistics...

Learning Predictive Analytics with R

By : Eric Mayor

Learning Predictive Analytics with R

By: Eric Mayor

Overview of this book

Related Content you might be interested in

Current Title:

Learning Predictive Analytics with R

Classification of the reviews

Document classification with k-NN