Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

Data imputation


And sometimes omitting missing values is not reasonable or possible at all, for example due to the low number of observations or if it seems that missing data is not random. Data imputation is a real alternative in such situations, and this method can replace NA with some real values based on various algorithms, such as filling empty cells with:

  • A known scalar

  • The previous value appearing in the column (hot-deck)

  • A random element from the same column

  • The most frequent value in the column

  • Different values from the same column with given probability

  • Predicted values based on regression or machine learning models

The hot-deck method is often used while joining multiple datasets together. In such a situation, the roll argument of data.table can be very useful and efficient, otherwise be sure to check out the hotdeck function in the VIM package, which offers some really useful ways of visualizing missing data. But when dealing with an already given column of a dataset, we have some other...