Learning Predictive Analytics with R

Learning Predictive Analytics with R

By : Eric Mayor

Buy this Book

Learning Predictive Analytics with R

By: Eric Mayor

Buy this Book

Overview of this book

This book is packed with easy-to-follow guidelines that explain the workings of the many key data mining tools of R, which are used to discover knowledge from your data. You will learn how to perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. All chapters will guide you in acquiring the skills in a practical way. Most chapters also include a theoretical introduction that will sharpen your understanding of the subject matter and invite you to go further. The book familiarizes you with the most common data mining tools of R, such as k-means, hierarchical regression, linear regression, association rules, principal component analysis, multilevel modeling, k-NN, Naïve Bayes, decision trees, and text mining. It also provides a description of visualization techniques using the basic visualization tools of R as well as lattice for visualizing patterns in data organized in groups. This book is invaluable for anyone fascinated by the data mining opportunities offered by GNU R and its packages.

Learning Predictive Analytics with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Setting GNU R for Predictive Analytics

Installing GNU R

The R graphic user interface

The menu bar of the R console

Packages

Summary

Visualizing and Manipulating Data Using R

The roulette case

Histograms and bar plots

Scatterplots

Boxplots

Line plots

Application – Outlier detection

Formatting plots

Summary

Data Visualization with Lattice

Loading and discovering the lattice package

Discovering multipanel conditioning with xyplot()

Discovering other lattice plots

Updating graphics

Case study – exploring cancer-related deaths in the US

Summary

Cluster Analysis

Distance measures

Learning by doing – partition clustering with kmeans()

Using k-means with public datasets

Summary

Agglomerative Clustering Using hclust()

The inner working of agglomerative clustering

Agglomerative clustering with hclust()

Summary

Dimensionality Reduction with Principal Component Analysis

The inner working of Principal Component Analysis

Learning PCA in R

Summary

Exploring Association Rules with Apriori

Apriori – basic concepts

The inner working of apriori

Analyzing data with apriori in R

Summary

Probability Distributions, Covariance, and Correlation

Probability distributions

Covariance and correlation

Summary

Linear Regression

Understanding simple regression

Working with multiple regression

Analyzing data in R: correlation and regression

Robust regression

Bootstrapping

Summary

Classification with k-Nearest Neighbors and Naïve Bayes

Understanding k-NN

Working with k-NN in R

Understanding Naïve Bayes

Working with Naïve Bayes in R

Computing the performance of classification

Summary

Classification Trees

Understanding decision trees

ID3

C4.5

C5.0

Classification and regression trees and random forest

Conditional inference trees and forests

Installing the packages containing the required functions

Performing the analyses in R

Caret – a unified framework for classification

Summary

Multilevel Analyses

Nested data

Multilevel regression

Multilevel modeling in R

Predictions using multilevel models

Summary

Text Analytics with R

An introduction to text analytics

Loading the corpus

Data preparation

Creating the training and testing data frames

Classification of the reviews

Mining the news with R

Summary

Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML

Cross-validation and bootstrapping of predictive models using the caret package

Exporting models using PMML

Summary

Exercises and Solutions

Exercises

Solutions

Distance measures

Partitioning clustering algorithms iteratively define k cluster centers and assign cluster membership (or the probability of group membership) to cases based on distances between the case and the cluster. Agglomerative clustering algorithms also create clusters based on distances, starting with each individual belonging to a separate cluster and the grouping clusters two by two. The k-nearest neighbors algorithm also uses distance measures.

Consider only one attribute, for instance the height of individuals. The distance of someone measuring 180 cm and someone measuring 170 cm will be 10 on this sole dimension considering the algebraic difference between the two measures as our distance metric. Things get a little more complicated when we add more attributes, such as weight (we will not consider variable scaling here). Let's say the first individual is clearly overweight (90 kg), and the second has a normal weight (80 kg). Considering only the sum of the difference between...

Learning Predictive Analytics with R

By : Eric Mayor

Learning Predictive Analytics with R

By: Eric Mayor

Overview of this book

Related Content you might be interested in

Current Title:

Learning Predictive Analytics with R

Distance measures