Book Image

R Data Analysis Cookbook - Second Edition

By : Kuntal Ganguly, Shanthi Viswanathan, Viswa Viswanathan
Book Image

R Data Analysis Cookbook - Second Edition

By: Kuntal Ganguly, Shanthi Viswanathan, Viswa Viswanathan

Overview of this book

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book will show you how you can put your data analysis skills in R to practical use, with recipes catering to the basic as well as advanced data analysis tasks. Right from acquiring your data and preparing it for analysis to the more complex data analysis techniques, the book will show you how you can implement each technique in the best possible manner. You will also visualize your data using the popular R packages like ggplot2 and gain hidden insights from it. Starting with implementing the basic data analysis concepts like handling your data to creating basic plots, you will master the more advanced data analysis techniques like performing cluster analysis, and generating effective analysis reports and visualizations. Throughout the book, you will get to know the common problems and obstacles you might encounter while implementing each of the data analysis techniques in R, with ways to overcoming them in the easiest possible way. By the end of this book, you will have all the knowledge you need to become an expert in data analysis with R, and put your skills to test in real-world scenarios.
Table of Contents (14 chapters)

Rescaling a variable to specified min-max range

Distance computations play a big role in many data analytics techniques. We know that variables with higher values tend to dominate distance computations and you may want to rescale the values to be in the range of 0 - 1.

Getting ready

Install the scales package and read the data-conversion.csv file from the book's data for this chapter into your R environment's working directory:

> install.packages("scales")
> library(scales)
> students <- read.csv("data-conversion.csv")

How to do it...

To rescale the Income variable to the range [0,1], use the following code snippet:

> students$Income.rescaled <- rescale(students$Income) 

How it works...

By default, the rescale() function makes the lowest value(s) zero and the highest value(s) one. It rescales all the other values proportionately. The following two expressions provide identical results:

> rescale(students$Income) 
> (students$Income - min(students$Income)) / (max(students$Income) - min(students$Income))

To rescale a different range than [0,1], use the to argument. The following snippet rescales students$Income to the range (0,100):

> rescale(students$Income, to = c(1, 100)) 

There's more...

When using distance-based techniques, you may need to rescale several variables. You may find it tedious to scale one variable at a time.

Rescaling many variables at once

Use the following function to rescale variables:

rescale.many <- function(dat, column.nos) { 
nms <- names(dat)
for(col in column.nos) {
name <- paste(nms[col],".rescaled", sep = "")
dat[name] <- rescale(dat[,col])
}
cat(paste("Rescaled ", length(column.nos), " variable(s)n"))
dat
}

With the preceding function defined, we can do the following to rescale the first and fourth variables in the data frame:

> rescale.many(students, c(1,4)) 

See also

  • The Normalizing or standardizing data in a data frame recipe in this chapter.