Book Image

Java Deep Learning Cookbook

By : Rahul Raj
Book Image

Java Deep Learning Cookbook

By: Rahul Raj

Overview of this book

Java is one of the most widely used programming languages in the world. With this book, you will see how to perform deep learning using Deeplearning4j (DL4J) – the most popular Java library for training neural networks efficiently. This book starts by showing you how to install and configure Java and DL4J on your system. You will then gain insights into deep learning basics and use your knowledge to create a deep neural network for binary classification from scratch. As you progress, you will discover how to build a convolutional neural network (CNN) in DL4J, and understand how to construct numeric vectors from text. This deep learning book will also guide you through performing anomaly detection on unsupervised data and help you set up neural networks in distributed systems effectively. In addition to this, you will learn how to import models from Keras and change the configuration in a pre-trained DL4J model. Finally, you will explore benchmarking in DL4J and optimize neural networks for optimal results. By the end of this book, you will have a clear understanding of how you can use DL4J to build robust deep learning applications in Java.
Table of Contents (14 chapters)

Using Word2Vec for sentence classification using CNNs

Neural networks require numerical inputs to perform their operations as expected. For text inputs, we cannot directly feed text data into a neural network. Since Word2Vec converts text data to vectors, it is possible to exploit Word2Vec so that we can use it with neural networks. We will use a pretrained Google News vector model as a reference and train a CNN network on top of it. At the end of this process, we will develop an IMDB review classifier to classify reviews as positive or negative. As per the paper found at https://arxiv.org/abs/1408.5882, combining a pretrained Word2Vec model with a CNN will give us better results.

We will employ custom CNN architecture along with the pretrained word vector model as suggested by Yoon Kim in his 2014 publication, https://arxiv.org/abs/1408.5882. The architecture is slightly more...