There are numerous causes behind missing data. For example, it could be the result of typos or data process flaws. However, if there is missing data in our analysis process, the results of the analysis may be misleading. Thus, it is important to detect missing values before proceeding with further analysis.
Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees
and salaries
datasets by following the steps from the Renaming the data variable recipe.
Perform the following steps to detect missing values:
First, we set the
to_date
attribute with a date over2100-01-01
:> salaries[salaries$to_date > "2100-01-01",]
We then change the data with a date over
2100-01-01
to a missing value:> salaries[salaries$to_date > "2100-01-01","to_date"] = NA
Next, we can use the
is.na
function to find which rows contain missing values:> is.na(salaries...