Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

Identifying missing data


The easiest way of dealing with missing values, especially with MCAR data, is simply removing all the observations with any missing values. If we want to exclude every row of a matrix or data.frame object which has at least one missing value, we can use the complete.cases function from the stats package to identify those.

For a quick start, let's see how many rows have at least one missing value:

> library(hflights)
> table(complete.cases(hflights))
 FALSE   TRUE 
  3622 223874

This is around 1.5 percent of the quarter million rows:

> prop.table(table(complete.cases(hflights))) * 100
    FALSE      TRUE 
 1.592116 98.407884

Let's see what the distribution of NA looks like within different columns:

> sort(sapply(hflights, function(x) sum(is.na(x))))
             Year             Month        DayofMonth 
                0                 0                 0 
        DayOfWeek     UniqueCarrier         FlightNum 
                0                 0         ...