R Data Analysis Cookbook

R Data Analysis Cookbook

By : Viswa Viswanathan, Shanthi Viswanathan

Buy this Book

R Data Analysis Cookbook

By: Viswa Viswanathan, Shanthi Viswanathan

Buy this Book

Overview of this book

<p>Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data.</p> <p>This book empowers you by showing you ways to use R to generate professional analysis reports. It provides examples for various important analysis and machine-learning tasks that you can try out with associated and readily available data. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.</p>

R Data Analysis Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Acquire and Prepare the Ingredients – Your Data

Introduction

Reading data from CSV files

Reading XML data

Reading JSON data

Reading data from fixed-width formatted files

Reading data from R files and R libraries

Removing cases with missing values

Replacing missing values with the mean

Removing duplicate cases

Rescaling a variable to [0,1]

Normalizing or standardizing data in a data frame

Binning numerical data

Creating dummies for categorical variables

What's in There? – Exploratory Data Analysis

Introduction

Creating standard data summaries

Extracting a subset of a dataset

Splitting a dataset

Creating random data partitions

Generating standard plots such as histograms, boxplots, and scatterplots

Generating multiple plots on a grid

Selecting a graphics device

Creating plots with the lattice package

Creating plots with the ggplot2 package

Creating charts that facilitate comparisons

Creating charts that help visualize a possible causality

Creating multivariate plots

Where Does It Belong? – Classification

Introduction

Generating error/classification-confusion matrices

Generating ROC charts

Building, plotting, and evaluating – classification trees

Using random forest models for classification

Classifying using Support Vector Machine

Classifying using the Naïve Bayes approach

Classifying using the KNN approach

Using neural networks for classification

Classifying using linear discriminant function analysis

Classifying using logistic regression

Using AdaBoost to combine classification tree models

Give Me a Number – Regression

Introduction

Computing the root mean squared error

Building KNN models for regression

Performing linear regression

Performing variable selection in linear regression

Building regression trees

Building random forest models for regression

Using neural networks for regression

Performing k-fold cross-validation

Performing leave-one-out-cross-validation to limit overfitting

Can You Simplify That? – Data Reduction Techniques

Introduction

Performing cluster analysis using K-means clustering

Performing cluster analysis using hierarchical clustering

Reducing dimensionality with principal component analysis

Lessons from History – Time Series Analysis

Introduction

Creating and examining date objects

Operating on date objects

Performing preliminary analyses on time series data

Using time series objects

Decomposing time series

Filtering time series data

Smoothing and forecasting using the Holt-Winters method

Building an automated ARIMA model

It's All About Your Connections – Social Network Analysis

Introduction

Downloading social network data using public APIs

Creating adjacency matrices and edge lists

Plotting social network data

Computing important network metrics

Put Your Best Foot Forward – Document and Present Your Analysis

Introduction

Generating reports of your data analysis with R Markdown and knitr

Creating interactive web applications with shiny

Creating PDF presentations of your analysis with R Presentation

Work Smarter, Not Harder – Efficient and Elegant R Code

Introduction

Exploiting vectorized operations

Processing entire rows or columns using the apply function

Applying a function to all elements of a collection with lapply and sapply

Applying functions to subsets of a vector

Using the split-apply-combine strategy with plyr

Slicing, dicing, and combining data with data tables

Where in the World? – Geospatial Analysis

Introduction

Downloading and plotting a Google map of an area

Overlaying data on the downloaded Google map

Importing ESRI shape files into R

Using the sp package to plot geographic data

Getting maps from the maps package

Creating spatial data frames from regular data frames containing spatial and other data

Creating spatial data frames by combining regular data frames with spatial objects

Adding variables to an existing spatial data frame

Playing Nice – Connecting to Other Systems

Introduction

Using Java objects in R

Using JRI to call R functions from Java

Using Rserve to call R functions from Java

Executing R scripts from Java

Using the xlsx package to connect to Excel

Reading data from relational databases – MySQL

Reading data from NoSQL databases – MongoDB

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Performing cluster analysis using K-means clustering

The standard R package stats provides the function for K-means clustering. We also use the cluster package to plot the results of our cluster analysis.

Getting ready

If you have not already downloaded the files for this chapter, do it now and ensure that the auto-mpg.csv file is in your R working directory. Also, ensure that you have installed the cluster package.

How to do it...

To perform cluster analysis using K-means clustering, follow theses steps:

Read the data:
```
> auto <- read.csv("auto-mpg.csv")
```

Define a convenience function to standardize the relevant variables and append the resulting variables to the original data:

rdacb.scale.many <- function (dat, column_nos) {
  nms <- names(dat)
  for (col in column_nos) {
    name <- paste0(nms[col], "_z")
    dat[name] <- scale(dat[, col])
  }
  cat(paste("Scaled", length(column_nos), "variable(s)\n"))
  dat
}

Use the preceding convenience function to standardize the variables...

R Data Analysis Cookbook

By : Viswa Viswanathan, Shanthi Viswanathan

R Data Analysis Cookbook

By: Viswa Viswanathan, Shanthi Viswanathan

Overview of this book

Related Content you might be interested in

Current Title:

R Data Analysis Cookbook

Performing cluster analysis using K-means clustering

Getting ready

How to do it...