Learning Predictive Analytics with R

Learning Predictive Analytics with R

By : Eric Mayor

Buy this Book

Learning Predictive Analytics with R

By: Eric Mayor

Buy this Book

Overview of this book

This book is packed with easy-to-follow guidelines that explain the workings of the many key data mining tools of R, which are used to discover knowledge from your data. You will learn how to perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. All chapters will guide you in acquiring the skills in a practical way. Most chapters also include a theoretical introduction that will sharpen your understanding of the subject matter and invite you to go further. The book familiarizes you with the most common data mining tools of R, such as k-means, hierarchical regression, linear regression, association rules, principal component analysis, multilevel modeling, k-NN, Naïve Bayes, decision trees, and text mining. It also provides a description of visualization techniques using the basic visualization tools of R as well as lattice for visualizing patterns in data organized in groups. This book is invaluable for anyone fascinated by the data mining opportunities offered by GNU R and its packages.

Learning Predictive Analytics with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Setting GNU R for Predictive Analytics

Installing GNU R

The R graphic user interface

The menu bar of the R console

Packages

Summary

Visualizing and Manipulating Data Using R

The roulette case

Histograms and bar plots

Scatterplots

Boxplots

Line plots

Application – Outlier detection

Formatting plots

Summary

Data Visualization with Lattice

Loading and discovering the lattice package

Discovering multipanel conditioning with xyplot()

Discovering other lattice plots

Updating graphics

Case study – exploring cancer-related deaths in the US

Summary

Cluster Analysis

Distance measures

Learning by doing – partition clustering with kmeans()

Using k-means with public datasets

Summary

Agglomerative Clustering Using hclust()

The inner working of agglomerative clustering

Agglomerative clustering with hclust()

Summary

Dimensionality Reduction with Principal Component Analysis

The inner working of Principal Component Analysis

Learning PCA in R

Summary

Exploring Association Rules with Apriori

Apriori – basic concepts

The inner working of apriori

Analyzing data with apriori in R

Summary

Probability Distributions, Covariance, and Correlation

Probability distributions

Covariance and correlation

Summary

Linear Regression

Understanding simple regression

Working with multiple regression

Analyzing data in R: correlation and regression

Robust regression

Bootstrapping

Summary

Classification with k-Nearest Neighbors and Naïve Bayes

Understanding k-NN

Working with k-NN in R

Understanding Naïve Bayes

Working with Naïve Bayes in R

Computing the performance of classification

Summary

Classification Trees

Understanding decision trees

ID3

C4.5

C5.0

Classification and regression trees and random forest

Conditional inference trees and forests

Installing the packages containing the required functions

Performing the analyses in R

Caret – a unified framework for classification

Summary

Multilevel Analyses

Nested data

Multilevel regression

Multilevel modeling in R

Predictions using multilevel models

Summary

Text Analytics with R

An introduction to text analytics

Loading the corpus

Data preparation

Creating the training and testing data frames

Classification of the reviews

Mining the news with R

Summary

Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML

Cross-validation and bootstrapping of predictive models using the caret package

Exporting models using PMML

Summary

Exercises and Solutions

Exercises

Solutions

Classification and regression trees and random forest

We will now introduce classification and regression trees (CART) and random forest. CART uses different statistical criteria to decide on tree splits. Random forest uses ensemble learning (a combination of CART trees) to improve classification using a voting principle.

CART

There are a number of differences between CART used for classification and the family of algorithms we just discovered. Here, we only superficially discuss the partitioning criterion and pruning.

In CART, the attribute to be partition is selected with the Gini index as a decision criterion. In classification trees, the Gini index is simply computed as: 1—the sum of the squared probabilities for each possible partition on the attribute. The formula notation is:

This is more efficient compared to information gain and information ratio. Note that CART does not only do necessary partitioning on the modalities of the attribute, but also merges modalities together for the partition...

Learning Predictive Analytics with R

By : Eric Mayor

Learning Predictive Analytics with R

By: Eric Mayor

Overview of this book

Related Content you might be interested in

Current Title:

Learning Predictive Analytics with R

Classification and regression trees and random forest

CART