We will now cover some basic measures of a central tendency, dispersion, and simple plots. The first question that we will address is How does R handle missing values in calculations? To see what happens, create a vector with a missing value (NA
in the R language), then sum the values of the vector with sum()
:
> a <- c(1, 2, 3, NA) > sum(a) [1] NA
Unlike SAS, which would sum the non-missing values, R does not sum the non-missing values, but simply returns NA
, indicating that at least one value is missing. Now, we could create a new vector with the missing value deleted but you can also include the syntax to exclude any missing values with na.rm = TRUE
:
> sum(a, na.rm = TRUE) [1] 6
Functions exist to identify measures of the central tendency and dispersion of a vector:
> data <- c(4, 3, 2, 5.5, 7.8, 9, 14, 20) > mean(data) [1] 8.1625 > median(data) [1] 6.65 > sd(data) [1] 6.142112 > max(data) [1] 20 > min(data) [1] 2 > range...