And sometimes omitting missing values is not reasonable or possible at all, for example due to the low number of observations or if it seems that missing data is not random. Data imputation is a real alternative in such situations, and this method can replace NA
with some real values based on various algorithms, such as filling empty cells with:
A known scalar
The previous value appearing in the column (hot-deck)
A random element from the same column
The most frequent value in the column
Different values from the same column with given probability
Predicted values based on regression or machine learning models
The hot-deck method is often used while joining multiple datasets together. In such a situation, the roll
argument of data.table
can be very useful and efficient, otherwise be sure to check out the hotdeck
function in the VIM
package, which offers some really useful ways of visualizing missing data. But when dealing with an already given column of a dataset, we have some other...