Book Image

Deep Learning with R for Beginners

By : Mark Hodnett, Joshua F. Wiley, Yuxi (Hayden) Liu, Pablo Maldonado
Book Image

Deep Learning with R for Beginners

By: Mark Hodnett, Joshua F. Wiley, Yuxi (Hayden) Liu, Pablo Maldonado

Overview of this book

Deep learning has a range of practical applications in several domains, while R is the preferred language for designing and deploying deep learning models. This Learning Path introduces you to the basics of deep learning and even teaches you to build a neural network model from scratch. As you make your way through the chapters, you’ll explore deep learning libraries and understand how to create deep learning models for a variety of challenges, right from anomaly detection to recommendation systems. The Learning Path will then help you cover advanced topics, such as generative adversarial networks (GANs), transfer learning, and large-scale deep learning in the cloud, in addition to model optimization, overfitting, and data augmentation. Through real-world projects, you’ll also get up to speed with training convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory networks (LSTMs) in R. By the end of this Learning Path, you’ll be well-versed with deep learning and have the skills you need to implement a number of deep learning concepts in your research work or projects.
Table of Contents (23 chapters)
Title Page
Copyright and Credits
About Packt
Contributors
Preface
Index

Sentiment analysis from movie reviews


Let's continue with the IMDb data and put into practice the ideas from the previous sections. In this section, we will use a few familiar packages, like tidytext, plyr and dplyr, as well as the excellent text2vec by Dimitriy Selivanov, which was released in 2017, and the well-known caret package by Max Kuhn.

Data preprocessing

We need to prepare our data for the algorithm.

First, a few imports that will be necessary:

library(plyr)
library(dplyr)
library(text2vec)
library(tidytext)
library(caret)

We will use the IMDb data as before:

imdb <- read.csv("./data/labeledTrainData.tsv", encoding = "utf-8", quote = "", sep="\t", stringsAsFactors = F)

And create an iterator over the tokens:

tokens <- space_tokenizer(imdb$review)
token_iterator <- itoken(tokens)

The tokens are simple words, also known as unigrams. This constitutes our vocabulary:

vocab <- create_vocabulary(token_iterator)

It's important for the co-occurrence matrix to include only words that appear...