Book Image

R Deep Learning Projects

Book Image

R Deep Learning Projects

Overview of this book

R is a popular programming language used by statisticians and mathematicians for statistical analysis, and is popularly used for deep learning. Deep Learning, as we all know, is one of the trending topics today, and is finding practical applications in a lot of domains. This book demonstrates end-to-end implementations of five real-world projects on popular topics in deep learning such as handwritten digit recognition, traffic light detection, fraud detection, text generation, and sentiment analysis. You'll learn how to train effective neural networks in R—including convolutional neural networks, recurrent neural networks, and LSTMs—and apply them in practical scenarios. The book also highlights how neural networks can be trained using GPU capabilities. You will use popular R libraries and packages—such as MXNetR, H2O, deepnet, and more—to implement the projects. By the end of this book, you will have a better understanding of deep learning concepts and techniques and how to use them in a practical setting.
Table of Contents (11 chapters)

Sentiment analysis from movie reviews


Let's continue with the IMDb data and put into practice the ideas from the previous sections. In this section, we will use a few familiar packages, like tidytext, plyr and dplyr, as well as the excellent text2vec by Dimitriy Selivanov, which was released in 2017, and the well-known caret package by Max Kuhn.

Data preprocessing

We need to prepare our data for the algorithm.

First, a few imports that will be necessary:

library(plyr)
library(dplyr)
library(text2vec)
library(tidytext)
library(caret)

We will use the IMDb data as before:

imdb <- read.csv("./data/labeledTrainData.tsv", encoding = "utf-8", quote = "", sep="\t", stringsAsFactors = F)

And create an iterator over the tokens:

tokens <- space_tokenizer(imdb$review)
token_iterator <- itoken(tokens)

The tokens are simple words, also known as unigrams. This constitutes our vocabulary:

vocab <- create_vocabulary(token_iterator)

It's important for the co-occurrence matrix to include only words that appear...