Book Image

R Deep Learning Essentials

By : Joshua F. Wiley
Book Image

R Deep Learning Essentials

By: Joshua F. Wiley

Overview of this book

<p>Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures. With the superb memory management and the full integration with multi-node big data platforms, the H2O engine has become more and more popular among data scientists in the field of deep learning.</p> <p>This book will introduce you to the deep learning package H2O with R and help you understand the concepts of deep learning. We will start by setting up important deep learning packages available in R and then move towards building models related to neural networks, prediction, and deep prediction, all of this with the help of real-life examples.</p> <p>After installing the H2O package, you will learn about prediction algorithms. Moving ahead, concepts such as overfitting data, anomalous data, and deep prediction models are explained. Finally, the book will cover concepts relating to tuning and optimizing models.</p>
Table of Contents (14 chapters)
R Deep Learning Essentials
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Bibliography
Index

Dealing with missing data


When working with real-world applications, we often must contend with missing data. H2O includes a function to impute variables using the mean, median, or mode, and optionally to do so by some other grouping variables.

To examine how to impute missing data this way, we will use the small Iris dataset on flowers. In particular, we will set the petal width and length values to missing for the species "setosa" and then impute their values:

## setup iris data with some missing
d <- as.data.table(iris)
d[Species == "setosa", c("Petal.Width", "Petal.Length") := .(NA, NA)]

h2o.dmiss <- as.h2o(d, destination_frame="iris_missing")
h2o.dmeanimp <- as.h2o(d, destination_frame="iris_missing_imp")

First, we will do a simple mean imputation. This has to be done one column at a time:

## mean imputation
missing.cols <- colnames(h2o.dmiss)[apply(d, 2, anyNA)]

for (v in missing.cols) {
  h2o.dmeanimp <- h2o.impute(h2o.dmeanimp, column = v)
}

One problem with imputing...