Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying R Data Analysis Cookbook, Second Edition
  • Table Of Contents Toc
R Data Analysis Cookbook, Second Edition

R Data Analysis Cookbook, Second Edition - Second Edition

By : Kuntal Ganguly, Shanthi Viswanathan, Viswa Viswanathan, Davor Lozić, Mzabalazo Z. Ngwenya, Andrew Bauman
3.3 (4)
close
close
R Data Analysis Cookbook, Second Edition

R Data Analysis Cookbook, Second Edition

3.3 (4)
By: Kuntal Ganguly, Shanthi Viswanathan, Viswa Viswanathan, Davor Lozić, Mzabalazo Z. Ngwenya, Andrew Bauman

Overview of this book

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book will show you how you can put your data analysis skills in R to practical use, with recipes catering to the basic as well as advanced data analysis tasks. Right from acquiring your data and preparing it for analysis to the more complex data analysis techniques, the book will show you how you can implement each technique in the best possible manner. You will also visualize your data using the popular R packages like ggplot2 and gain hidden insights from it. Starting with implementing the basic data analysis concepts like handling your data to creating basic plots, you will master the more advanced data analysis techniques like performing cluster analysis, and generating effective analysis reports and visualizations. Throughout the book, you will get to know the common problems and obstacles you might encounter while implementing each of the data analysis techniques in R, with ways to overcoming them in the easiest possible way. By the end of this book, you will have all the knowledge you need to become an expert in data analysis with R, and put your skills to test in real-world scenarios.
Table of Contents (14 chapters)
close
close

Binning numerical data

Sometimes, we need to convert numerical data to categorical data or a factor. For example, Naive Bayes classification requires all variables (independent and dependent) to be categorical. In other situations, we may want to apply a classification method to a problem where the dependent variable is numeric but needs to be categorical.

Getting ready

From the code files for this chapter, store the data-conversion.csv file in the working directory of your R environment. Then read the data:

> students <- read.csv("data-conversion.csv") 

How to do it...

Income is a numeric variable, and you may want to create a categorical variable from it by creating bins. Suppose you want to label incomes of $10,000 or below as Low, incomes between $10,000 and $31,000 as Medium, and the rest as High. We can do the following:

  1. Create a vector of break points:
> b <- c(-Inf, 10000, 31000, Inf) 
  1. Create a vector of names for break points:
> names <- c("Low", "Medium", "High") 
  1. Cut the vector using the break points:
> students$Income.cat <- cut(students$Income, breaks = b, labels = names) 
> students

Age State Gender Height Income Income.cat
1 23 NJ F 61 5000 Low
2 13 NY M 55 1000 Low
3 36 NJ M 66 3000 Low
4 31 VA F 64 4000 Low
5 58 NY F 70 30000 Medium
6 29 TX F 63 10000 Low
7 39 NJ M 67 50000 High
8 50 VA M 70 55000 High
9 23 TX F 61 2000 Low
10 36 VA M 66 20000 Medium

How it works...

The cut() function uses the ranges implied by the breaks argument to infer the bins, and names them according to the strings provided in the labels argument. In our example, the function places incomes less than or equal to 10,000 in the first bin, incomes greater than 10,000 and less than or equal to 31,000 in the second bin, and incomes greater than 31,000 in the third bin. In other words, the first number in the interval is not included but the second one is. The number of bins will be one less than the number of elements in breaks. The strings in names become the factor levels of the bins.

If we leave out names, cut() uses the numbers in the second argument to construct interval names, as you can see here:

> b <- c(-Inf, 10000, 31000, Inf) 
> students$Income.cat1 <- cut(students$Income, breaks = b)
> students

Age State Gender Height Income Income.cat Income.cat1
1 23 NJ F 61 5000 Low (-Inf,1e+04]
2 13 NY M 55 1000 Low (-Inf,1e+04]
3 36 NJ M 66 3000 Low (-Inf,1e+04]
4 31 VA F 64 4000 Low (-Inf,1e+04]
5 58 NY F 70 30000 Medium (1e+04,3.1e+04]
6 29 TX F 63 10000 Low (-Inf,1e+04]
7 39 NJ M 67 50000 High (3.1e+04, Inf]
8 50 VA M 70 55000 High (3.1e+04, Inf]
9 23 TX F 61 2000 Low (-Inf,1e+04]
10 36 VA M 66 20000 Medium (1e+04,3.1e+04]

There's more...

You might not always be in a position to identify the breaks manually and may instead want to rely on R to do this automatically.

Creating a specified number of intervals automatically

Rather than determining the breaks and hence the intervals manually, as mentioned earlier, we can specify the number of bins we want, say n, and let the cut() function handle the rest automatically. In this case, cut() creates n intervals of approximately equal width, as follows:

> students$Income.cat2 <- cut(students$Income,     breaks = 4, labels = c("Level1", "Level2",       "Level3","Level4")) 
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
R Data Analysis Cookbook, Second Edition
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon