Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

Aggregation


The most straightforward way of summarizing data is calling the aggregate function from the stats package, which does exactly what we are looking for: splitting the data into subsets by a grouping variable, then computing summary statistics for them separately. The most basic way to call the aggregate function is to pass the numeric vector to be aggregated, and a factor variable to define the splits for the function passed in the FUN argument to be applied. Now, let's see the average ratio of diverted flights on each weekday:

> aggregate(hflights$Diverted, by = list(hflights$DayOfWeek),
+   FUN = mean)
  Group.1           x
1       1 0.002997672
2       2 0.002559323
3       3 0.003226211
4       4 0.003065727
5       5 0.002687865
6       6 0.002823121
7       7 0.002589057

Well, it took some time to run the preceding script, but please bear in mind that we have just aggregated around a quarter of a million rows to see the daily averages for the number of diverted flights...