We have been discussing the topic of the data scientists deducing or determining how to address or correct a dirty data issue, such as missing, incorrect, incomplete, or inconsistent values within a data pool.
When data is missing (or incorrect, incomplete, or inconsistent) within a data pool, it can make handling and analysis difficult and can introduce bias to the results of the analysis performed on the data. This leads us to imputation.
In data statistics, imputation is when, through a data cleansing procedure, the data scientist replaces missing (or otherwise specified) data with other values.
Because missing data can create problems in analyzing data, imputation is seen as a way to avoid the dangers involved with simply discarding or removing altogether the cases with missing values. In fact, some statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves...