Book Image

Data Manipulation with R - Second Edition

By : Jaynal Abedin, Kishor Kumar Das
Book Image

Data Manipulation with R - Second Edition

By: Jaynal Abedin, Kishor Kumar Das

Overview of this book

<p>This book starts with the installation of R and how to go about using R and its libraries. We then discuss the mode of R objects and its classes and then highlight different R data types with their basic operations.</p> <p>The primary focus on group-wise data manipulation with the split-apply-combine strategy has been explained with specific examples. The book also contains coverage of some specific libraries such as lubridate, reshape2, plyr, dplyr, stringr, and sqldf. You will not only learn about group-wise data manipulation, but also learn how to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package.</p> <p>By the end of this book, you will have learned about text manipulation using stringr, how to extract data from twitter using twitteR library, how to clean raw data, and how to structure your raw data for data mining.</p>
Table of Contents (13 chapters)
Data Manipulation with R Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Comparing base R and plyr


In this section, we will compare the code side by side to solve the same problem using both default R and plyr. Reusing the iris3 data, we are now interested in producing five-number summary statistics for each variable group by species. The five numbers will be minimum, mean, median, maximum, and standard deviation. The output will be a list of data frames.

To calculate the five-number summary statistics, follow these steps:

  1. Define a function that will calculate five-number summary statistics for a given vector.

  2. Produce the output of this function in a data frame object.

  3. Apply this function in the iris3 dataset using a for loop.

  4. Apply the same function using the apply() function of the plyr package.

An example that explains the calculation of the five-number summary statistics is as follows:

# Function to calculate five number summary 
fivenum.summary <- function(x) 
{ 
results <-data.frame(min=apply(x,2,min), 
mean=apply(x,2,mean), 
median=apply(x,2,median), 
max...