Book Image

Java Deep Learning Cookbook

By : Rahul Raj
Book Image

Java Deep Learning Cookbook

By: Rahul Raj

Overview of this book

Java is one of the most widely used programming languages in the world. With this book, you will see how to perform deep learning using Deeplearning4j (DL4J) – the most popular Java library for training neural networks efficiently. This book starts by showing you how to install and configure Java and DL4J on your system. You will then gain insights into deep learning basics and use your knowledge to create a deep neural network for binary classification from scratch. As you progress, you will discover how to build a convolutional neural network (CNN) in DL4J, and understand how to construct numeric vectors from text. This deep learning book will also guide you through performing anomaly detection on unsupervised data and help you set up neural networks in distributed systems effectively. In addition to this, you will learn how to import models from Keras and change the configuration in a pre-trained DL4J model. Finally, you will explore benchmarking in DL4J and optimize neural networks for optimal results. By the end of this book, you will have a clear understanding of how you can use DL4J to build robust deep learning applications in Java.
Table of Contents (14 chapters)

Determining the right activation function

The purpose of an activation function is to introduce non-linearity into a neural network. Non-linearity helps a neural network to learn more complex patterns. We will discuss some important activation functions, and their respective DL4J implementations.

The following are the activation functions that we will consider:

  • Tanh
  • Sigmoid
  • ReLU (short for Rectified Linear Unit)
  • Leaky ReLU
  • Softmax

In this recipe, we will walk through the key steps to decide the right activation functions for a neural network.

How to do it...

  1. Choose an activation function according to the network layers: We need to know the activation functions to be used for the input/hidden layers and output layers. Use ReLU for input/hidden layers preferably.
  2. Choose the right activation function to handle data impurities: Inspect the data that you feed to the neural network. Do you have inputs with a majority of negative values observing dead neurons? Choose the appropriate activation functions accordingly. Use Leaky ReLU if dead neurons are observed during training.
  3. Choose the right activation function to handle overfitting: Observe the evaluation metrics and their variation for each training period. Understand gradient behavior and how well your model performs on new unseen data.
  4. Choose the right activation function as per the expected output of your use case: Examine the desired outcome of your network as a first step. For example, the SOFTMAX function can be used when you need to measure the probability of the occurrence of the output class. It is used in the output layer. For any input/hidden layers, ReLU is what you need for most cases. If you're not sure about what to use, just start experimenting with ReLU; if that doesn't improve your expectations, then try other activation functions.

How it works...

For step 1, ReLU is most commonly used because of its non-linear behavior. The output layer activation function depends on the expected output behavior. Step 4 targets this too.

For step 2, Leaky ReLU is an improved version of ReLU and is used to avoid the zero gradient problem. However, you might observe a performance drop. We use Leaky ReLU if dead neurons are observed during training. Dead neurons are referred to as neurons with a zero gradient for all possible inputs, which makes them useless for training.

For step 3, the tanh and sigmoid activation functions are similar and are used in feed-forward networks. If you use these activation functions, then make sure you add regularization to network layers to avoid the vanishing gradient problem. These are generally used for classifier problems.

There's more...

The ReLU activation function is non-linear, hence, the backpropagation of errors can easily be performed. Backpropagation is the backbone of neural networks. This is the learning algorithm that computes gradient descent with respect to weights across neurons. The following are ReLU variations currently supported in DL4J:

  • ReLU: The standard ReLU activation function:
public static final Activation RELU
  • ReLU6: ReLU activation, which is capped at 6, where 6 is an arbitrary choice:
public static final Activation RELU6
  • RReLU: The randomized ReLU activation function:
public static final Activation RRELU
  • ThresholdedReLU: Threshold ReLU:
public static final Activation THRESHOLDEDRELU

There are a few more implementations, such as SeLU (short for the Scaled Exponential Linear Unit), which is similar to the ReLU activation function but has a slope for negative values.