Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

By-passing missing values


So it seems that missing data relatively frequently occurs with the time-related variables, but we have no missing values among the flight identifiers and dates. On the other hand, if one value is missing for a flight, the chances are rather high that some other variables are missing as well – out of the overall number of 3,622 cases with at least one missing value:

> mean(cor(apply(hflights, 2, function(x)
+    as.numeric(is.na(x)))), na.rm = TRUE)
[1] 0.9589153
Warning message:
In cor(apply(hflights, 2, function(x) as.numeric(is.na(x)))) :
  the standard deviation is zero

Okay, let's see what we have done here! First, we have called the apply function to transform the values of data.frame to 0 or 1, where 0 stands for an observed, while 1 means a missing value. Then we computed the correlation coefficients of this newly created matrix, which of course returned a lot of missing values due to fact that some columns had only one unique value without any variability...