The previous recipe showed us how to detect missing values within the dataset. Though the data with missing values is rather incomplete, we can still adapt a heuristic approach to complete our dataset. Here, we introduce some techniques one can employ to impute missing values.
Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees
and salaries
datasets by following the steps in the Renaming the data variable recipe.
Perform the following steps to impute missing values:
First, we subset user data with
emp_no
equal to10001
:> test.emp <- salaries[salaries$emp_no == 10001,]
Then, we purposely assign
salary
as the missing value of row8
:> test.emp[8,c("salary")] [1] 75286 > test.emp[8,c("salary")] = NA
For the first imputing method, we can remove records with missing values using the
na.omit
function:> na.omit(test.emp)
On the other...