Book Image

R Data Analysis Cookbook

By : Viswa Viswanathan, Shanthi Viswanathan
Book Image

R Data Analysis Cookbook

By: Viswa Viswanathan, Shanthi Viswanathan

Overview of this book

<p>Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data.</p> <p>This book empowers you by showing you ways to use R to generate professional analysis reports. It provides examples for various important analysis and machine-learning tasks that you can try out with associated and readily available data. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.</p>
Table of Contents (18 chapters)
R Data Analysis Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Reading data from fixed-width formatted files


In fixed-width formatted files, columns have fixed widths; if a data element does not use up the entire allotted column width, then the element is padded with spaces to make up the specified width. To read fixed-width text files, specify columns by column widths or by starting positions.

Getting ready

Download the files for this chapter and store the student-fwf.txt file in your R working directory.

How to do it...

Read the fixed-width formatted file as follows:

> student  <- read.fwf("student-fwf.txt", widths=c(4,15,20,15,4), col.names=c("id","name","email","major","year"))

How it works...

In the student-fwf.txt file, the first column occupies 4 character positions, the second 15, and so on. The c(4,15,20,15,4) expression specifies the widths of the five columns in the data file.

We can use the optional col.names argument to supply our own variable names.

There's more...

The read.fwf() function has several optional arguments that come in handy. We discuss a few of these as follows:

Files with headers

Files with headers use the following command:

> student  <- read.fwf("student-fwf-header.txt", widths=c(4,15,20,15,4), header=TRUE, sep="\t",skip=2)

If header=TRUE, the first row of the file is interpreted as having the column headers. Column headers, if present, need to be separated by the specified sep argument. The sep argument only applies to the header row.

The skip argument denotes the number of lines to skip; in this recipe, the first two lines are skipped.

Excluding columns from data

To exclude a column, make the column width negative. Thus, to exclude the e-mail column, we will specify its width as -20 and also remove the column name from the col.names vector as follows:

> student <- read.fwf("student-fwf.txt",widths=c(4,15,-20,15,4), col.names=c("id","name","major","year"))