Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying R Data Analysis Cookbook, Second Edition
  • Table Of Contents Toc
  • Feedback & Rating feedback
R Data Analysis Cookbook, Second Edition

R Data Analysis Cookbook, Second Edition - Second Edition

By : Kuntal Ganguly, Viswanathan, Viswa Viswanathan
3.3 (4)
close
close
R Data Analysis Cookbook, Second Edition

R Data Analysis Cookbook, Second Edition

3.3 (4)
By: Kuntal Ganguly, Viswanathan, Viswa Viswanathan

Overview of this book

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book will show you how you can put your data analysis skills in R to practical use, with recipes catering to the basic as well as advanced data analysis tasks. Right from acquiring your data and preparing it for analysis to the more complex data analysis techniques, the book will show you how you can implement each technique in the best possible manner. You will also visualize your data using the popular R packages like ggplot2 and gain hidden insights from it. Starting with implementing the basic data analysis concepts like handling your data to creating basic plots, you will master the more advanced data analysis techniques like performing cluster analysis, and generating effective analysis reports and visualizations. Throughout the book, you will get to know the common problems and obstacles you might encounter while implementing each of the data analysis techniques in R, with ways to overcoming them in the easiest possible way. By the end of this book, you will have all the knowledge you need to become an expert in data analysis with R, and put your skills to test in real-world scenarios.
Table of Contents (14 chapters)
close
close

Reading data from CSV files

CSV formats are best used to represent sets or sequences of records in which each record has an identical list of fields. This corresponds to a single relation in a relational database, or to data (though not calculations) in a typical spreadsheet.

Getting ready

If you have not already downloaded the files for this chapter, do it now and ensure that the auto-mpg.csv file is in your R working directory.

How to do it...

Reading data from .csv files can be done using the following commands:

  1. Read the data from auto-mpg.csv, which includes a header row:
> auto <- read.csv("auto-mpg.csv", header=TRUE, sep = ",") 
  1. Verify the results:
> names(auto) 

How it works...

The read.csv() function creates a data frame from the data in the .csv file. If we pass header=TRUE, then the function uses the very first row to name the variables in the resulting data frame:

> names(auto) 

[1] "No" "mpg" "cylinders"
[4] "displacement" "horsepower" "weight"
[7] "acceleration" "model_year" "car_name"

The header and sep parameters allow us to specify whether the .csv file has headers and the character used in the file to separate fields. The header=TRUE and sep="," parameters are the defaults for the read.csv() function; we can omit these in the code example.

There's more...

The read.csv() function is a specialized form of read.table(). The latter uses whitespace as the default field separator. We will discuss a few important optional arguments to these functions.

Handling different column delimiters

In regions where a comma is used as the decimal separator, the .csv files use ";" as the field delimiter. While dealing with such data files, use read.csv2() to load data into R.

Alternatively, you can use the read.csv("<file name>", sep=";", dec=",") command.

Use sep="t" for tab-delimited files.

Handling column headers/variable names

If your data file does not have column headers, set header=FALSE.

The auto-mpg-noheader.csv file does not include a header row. The first command in the following snippet reads this file. In this case, R assigns default variable names V1, V2, and so on.

> auto  <- read.csv("auto-mpg-noheader.csv", header=FALSE) 
> head(auto,2)

V1 V2 V3 V4 V5 V6 V7 V8 V9
1 1 28 4 140 90 2264 15.5 71 chevrolet vega 2300
2 2 19 3 70 97 2330 13.5 72 mazda rx2 coupe

If your file does not have a header row, and you omit the header=FALSE optional argument, the read.csv() function uses the first row for variable names and ends up constructing variable names by adding X to the actual data values in the first row. Note the meaningless variable names in the following fragment:

> auto  <- read.csv("auto-mpg-noheader.csv") 
> head(auto,2)

X1 X28 X4 X140 X90 X2264 X15.5 X71 chevrolet.vega.2300
1 2 19 3 70 97 2330 13.5 72 mazda rx2 coupe
2 3 36 4 107 75 2205 14.5 82 honda accord

We can use the optional col.names argument to specify the column names. If col.names is given explicitly, the names in the header row are ignored, even if header=TRUE is specified:

> auto <- read.csv("auto-mpg-noheader.csv",     header=FALSE, col.names =       c("No", "mpg", "cyl", "dis","hp",         "wt", "acc", "year", "car_name")) 

> head(auto,2)

No mpg cyl dis hp wt acc year car_name
1 1 28 4 140 90 2264 15.5 71 chevrolet vega 2300
2 2 19 3 70 97 2330 13.5 72 mazda rx2 coupe

Handling missing values

When reading data from text files, R treats blanks in numerical variables as NA (signifying missing data). By default, it reads blanks in categorical attributes just as blanks and not as NA. To treat blanks as NA for categorical and character variables, set na.strings="":

> auto  <- read.csv("auto-mpg.csv", na.strings="") 

If the data file uses a specified string (such as "N/A" or "NA" for example) to indicate the missing values, you can specify that string as the na.strings argument, as in na.strings= "N/A" or na.strings = "NA".

Reading strings as characters and not as factors

By default, R treats strings as factors (categorical variables). In some situations, you may want to leave them as character strings. Use stringsAsFactors=FALSE to achieve this:

> auto <- read.csv("auto-mpg.csv",stringsAsFactors=FALSE) 

However, to selectively treat variables as characters, you can load the file with the defaults (that is, read all strings as factors) and then use as.character() to convert the requisite factor variables to characters.

Reading data directly from a website

If the data file is available on the web, you can load it directly into R, instead of downloading and saving it locally before loading it into R:

> dat <- read.csv("http://www.exploredata.net/ftp/WHO.csv") 
Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
R Data Analysis Cookbook, Second Edition
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon