Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

Filtering missing data before or during the actual analysis


Let's suppose we want to calculate the mean of the actual length of flights:

> mean(hflights$ActualElapsedTime)
[1] NA

The result is NA of course, because as identified previously, this variable contains missing values, and almost every R operation with NA results in NA. So let's overcome this issue as follows:

> mean(hflights$ActualElapsedTime, na.rm = TRUE)
[1] 129.3237
> mean(na.omit(hflights$ActualElapsedTime))
[1] 129.3237

Any performance issues there? Or other means of deciding which method to use?

> library(microbenchmark)
> NA.RM   <- function()
+              mean(hflights$ActualElapsedTime, na.rm = TRUE)
> NA.OMIT <- function()
+              mean(na.omit(hflights$ActualElapsedTime))
> microbenchmark(NA.RM(), NA.OMIT())
Unit: milliseconds
      expr       min        lq    median        uq       max neval
   NA.RM()  7.105485  7.231737  7.500382  8.002941  9.850411   100
 NA.OMIT() 12.268637 12.471294...