Sometimes, it is acceptable to have NA
values in the dataset. However, for many types of analysis, NA
values need to be either removed or replaced. In the case of road length, a better estimate of total road length could be generated if the NA
values were replaced with best guesses. In the following subsections, I will walk through these three approaches to handling NA
values:
- Deletion
- Insertion
- Imputation
The simplest way to handle NA
values is to delete any entry that contains an NA
value, or a certain number of NA
values. When removing entries with NA
values, there is a trade-off between the correctness of the data and the completeness of the data. Data entries that contain NA
values may also contain several useful non-NA values, and and removing too many data entries could reduce the dataset to a point where it is no longer useful.
For this dataset, it is not that important to have all of the years present; even one year is enough to give us a rough...