Learning Predictive Analytics with R

Learning Predictive Analytics with R

By : Eric Mayor

Buy this Book

Learning Predictive Analytics with R

By: Eric Mayor

Buy this Book

Overview of this book

This book is packed with easy-to-follow guidelines that explain the workings of the many key data mining tools of R, which are used to discover knowledge from your data. You will learn how to perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. All chapters will guide you in acquiring the skills in a practical way. Most chapters also include a theoretical introduction that will sharpen your understanding of the subject matter and invite you to go further. The book familiarizes you with the most common data mining tools of R, such as k-means, hierarchical regression, linear regression, association rules, principal component analysis, multilevel modeling, k-NN, Naïve Bayes, decision trees, and text mining. It also provides a description of visualization techniques using the basic visualization tools of R as well as lattice for visualizing patterns in data organized in groups. This book is invaluable for anyone fascinated by the data mining opportunities offered by GNU R and its packages.

Learning Predictive Analytics with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Setting GNU R for Predictive Analytics

Installing GNU R

The R graphic user interface

The menu bar of the R console

Packages

Summary

Visualizing and Manipulating Data Using R

The roulette case

Histograms and bar plots

Scatterplots

Boxplots

Line plots

Application – Outlier detection

Formatting plots

Summary

Data Visualization with Lattice

Loading and discovering the lattice package

Discovering multipanel conditioning with xyplot()

Discovering other lattice plots

Updating graphics

Case study – exploring cancer-related deaths in the US

Summary

Cluster Analysis

Distance measures

Learning by doing – partition clustering with kmeans()

Using k-means with public datasets

Summary

Agglomerative Clustering Using hclust()

The inner working of agglomerative clustering

Agglomerative clustering with hclust()

Summary

Dimensionality Reduction with Principal Component Analysis

The inner working of Principal Component Analysis

Learning PCA in R

Summary

Exploring Association Rules with Apriori

Apriori – basic concepts

The inner working of apriori

Analyzing data with apriori in R

Summary

Probability Distributions, Covariance, and Correlation

Probability distributions

Covariance and correlation

Summary

Linear Regression

Understanding simple regression

Working with multiple regression

Analyzing data in R: correlation and regression

Robust regression

Bootstrapping

Summary

Classification with k-Nearest Neighbors and Naïve Bayes

Understanding k-NN

Working with k-NN in R

Understanding Naïve Bayes

Working with Naïve Bayes in R

Computing the performance of classification

Summary

Classification Trees

Understanding decision trees

ID3

C4.5

C5.0

Classification and regression trees and random forest

Conditional inference trees and forests

Installing the packages containing the required functions

Performing the analyses in R

Caret – a unified framework for classification

Summary

Multilevel Analyses

Nested data

Multilevel regression

Multilevel modeling in R

Predictions using multilevel models

Summary

Text Analytics with R

An introduction to text analytics

Loading the corpus

Data preparation

Creating the training and testing data frames

Classification of the reviews

Mining the news with R

Summary

Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML

Cross-validation and bootstrapping of predictive models using the caret package

Exporting models using PMML

Summary

Exercises and Solutions

Exercises

Solutions

Application – Outlier detection

You might remember that at the beginning of the chapter, we noticed in the stacked bar plot that in our sample of 1,000 roulette spins, the zero was drawn about twice as often as we would expect. We just mentioned it but didn't really have a point of comparison. We now have proportions from 100 samples and thus can examine this a little further. The proportion of zeros can be obtained from the data we have as we simply have to subtract from 1, the sum of proportions of red and black numbers for each of the samples. So let's do this, and add the attribute to the data frame, and get the mean value of this proportion:

samples$isZero = 1-(samples$isRed+samples$isBlack)
Mean = mean(samples$isZero)
Mean

The mean value is 0.0277. We can compute the value we would expect is 1/37, which is 0.0270. The average value of the proportion of zeros in all our 100 samples is therefore almost identical to the expected value. This in no way means that there are no outliers.

There...

Learning Predictive Analytics with R

By : Eric Mayor

Learning Predictive Analytics with R

By: Eric Mayor

Overview of this book

Related Content you might be interested in

Current Title:

Learning Predictive Analytics with R

Application – Outlier detection