Book Image

R Deep Learning Essentials

By : Joshua F. Wiley
Book Image

R Deep Learning Essentials

By: Joshua F. Wiley

Overview of this book

<p>Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures. With the superb memory management and the full integration with multi-node big data platforms, the H2O engine has become more and more popular among data scientists in the field of deep learning.</p> <p>This book will introduce you to the deep learning package H2O with R and help you understand the concepts of deep learning. We will start by setting up important deep learning packages available in R and then move towards building models related to neural networks, prediction, and deep prediction, all of this with the help of real-life examples.</p> <p>After installing the H2O package, you will learn about prediction algorithms. Moving ahead, concepts such as overfitting data, anomalous data, and deep prediction models are explained. Finally, the book will cover concepts relating to tuning and optimizing models.</p>
Table of Contents (14 chapters)
R Deep Learning Essentials
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Bibliography
Index

R packages for deep learning


Although there are a number of R packages for machine learning, there are comparatively few available for neural networks and deep learning. In this section, we will see how to install all the necessary R packages and set them up to use neural networks and deep learning.

It is helpful to have a good integrated development environment (IDE) for working with R and doing data analysis. I use Emacs, a powerful text editor, along with Emacs Speaks Statistics (ESS), which helps Emacs work nicely with R. An easy way to get up-and-running is to use a modified distribution of Emacs designed to work nicely with R and for statistics. It is created and maintained by Vincent Goulet and is freely available at http://vgoulet.act.ulaval.ca/en/emacs/. Another popular R IDE is Rstudio (https://www.rstudio.com/). One advantage of both Emacs and Rstudio is that they are available on all major platforms (Windows, Mac, and Linux), so even if you switch computers you can have a consistent IDE experience.

Setting up reproducible results

Software for data science is advancing and changing rapidly. Although this is wonderful for progress, it can make reproducing someone else's results a challenge. Even your own code may not work when you go back to it a few months later. One way to address this issue is to make a record of what versions of software were used and ensure there is a snapshot of them available. For this book, we will use the R package checkpoint provided by Revolution Analytics; this works in connection with their server, which provides daily snapshots (checkpoints) of the Comprehensive R Archive Network (CRAN). To learn more about this process, you can read the online vignette for the package available at https://cran.r-project.org/web/packages/checkpoint/vignettes/checkpoint.html.

This book was written using R version 3.2.3, nicknamed Wooden Christmas-Tree, on Windows 10 Professional x64. Although this is the latest version of R at the time of writing, as new versions are released CRAN keeps copies of older R versions both as binaries (in the future at https://cran.r-project.org/bin/windows/base/old/) and as source tar balls (https://cran.r-project.org/src/base/R-3/), which can be used to compile the source to any operating system.

For H2O, one of the main R packages will be used for deep learning, we will also need Java installed. This book was written using the Java SE Development Kit 8 update 66 for 64 bit. You can download Java for your operating system at http://www.oracle.com/technetwork/java/javase/.

With those steps done, we are ready to get started. To use the checkpoint package, put all your R scripts for one project together in a single folder. Installing R packages using the checkpoint package is a somewhat circular process. The checkpoint package works by scanning R scripts in the project directory to see what packages are loaded (and therefore that it needs to install), by checking for calls to the library() or require() functions. Of course, we cannot actually use the library() function until we have installed the packages.

To begin with, create an R script in your project directory called checkpoint.R with the following code:

## uncomment to install the checkpoint package
## install.packages("checkpoint")
library(checkpoint)

checkpoint("2016-02-20", R.version = "3.2.3")

Once you have created the R script, you can uncomment and run the code to install the checkpoint package. You only need to do this once, so when you are done it's best to comment the code out again so it is not re-installed each time you run the file. This is the file we will run each time we want to set up our R environment for this deep learning project. The checkpoint for this book is 20th February 2016 and we are using R version 3.2.3. Next, we can add library() calls for some packages we will need to be available by adding the following code to our checkpoint.R script (but note that these are not run yet!):

## Chapter 1 ##

## Tools
library(RCurl)
library(jsonlite)
library(caret)
library(e1071)

## basic stats packages
library(statmod)
library(MASS)

Tip

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

  • Log in or register to our website using your e-mail address and password.

  • Hover the mouse pointer on the SUPPORT tab at the top.

  • Click on Code Downloads & Errata.

  • Enter the name of the book in the Search box.

  • Select the book for which you're looking to download the code files.

  • Choose from the drop-down menu where you purchased this book from.

  • Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows

  • Zipeg / iZip / UnRarX for Mac

  • 7-Zip / PeaZip for Linux

Once we have added that code, save the file so that any changes are written to the disk, and then run the first couple of lines to load the checkpoint package and the call to checkpoint(). The results should look something like Figure 1.5:

Figure 1.5

The checkpoint package asks to create a directory to store specific versions of the packages used, and then finds all packages and installs them. The next sections show how to set up some specific R packages for deep learning.

Neural networks

There are several packages in R that can fit basic neural networks. The nnet package is a recommended package and can fit feed-forward neural networks with one hidden layer, like the one shown in Figure 1.3. For more details on the nnet package, see Venables, W. N. and Ripley, B. D. (2002). The neuralnet package also fits shallow neural networks with one hidden layer, but can train them using back-propagation and allows custom error and neuron activation functions. Finally, we come to the RSNNS package, which is an R wrapper of the Stuttgart Neural Network Simulator (SNNS). The SNNS was originally written in C, but was ported to C++. RSNNS allows many types of models to fit in R. Common models are available using convenient wrappers, but the RSNNS package also makes many model components from SNNS available, making it possible to train a wide variety of models. For more details on the RSNNS package, see Bergmeir, C., and Benítez, J. M. (2012). We will see examples of how to use these models in Chapter 2, Training a Prediction Model. For now, we can install them by adding the following code to the checkpoint.R script and saving it. Saving is important because, if our changes to the R script are not written to the disk, the checkpoint() function will not see the changes and will not find and install the new packages:

## neural networks
library(nnet)
library(neuralnet)
library(RSNNS)

Now, if we re-run the checkpoint() function and it is successful, R should tell us that it discovered eight packages and that it installed nnet, neuralnet, RSNNS, and Rcpp, a dependency for the RSNNS package.

The deepnet package

The deepnet package provides a number of tools for deep learning in R. Specifically, it can train RBMs and use these as part of DBNs to generate initial values to train deep neural networks. The deepnet package also allows for different activation functions, and the use of dropout for regularization. To install it, we follow the same process we used before adding the following code to the checkpoint.R script, saving it, and then re-running the checkpoint() function:

## deep learning
library(deepnet)

The darch package

The darch package is based on Matlab code by George Hinton and stands for deep architectures. It can train RBMs and DBNs along with a variety of options related to each. A limitation of the darch package is that, because it is a pure R implementation, model training tends to be slow. To install it, we follow the same process we used before adding the following code to the checkpoint.R script, saving it, and then re-running the checkpoint() function:

## deep learning
library(darch)

The H2O package

The H2O package provides an interface to the H2O software. H2O is written in Java and is fast and scalable. It provides not only deep learning functionality, but also a variety of other popular machine learning algorithms and models, and the model results can be stored as pure Java code to allow fast scoring, facilitating the deployment of models to solve real-world problems. To install it, we follow the same process we used before adding the following code to the checkpoint.R script, saving it, and then re-running the checkpoint() function:

## deep learning
library(h2o)