Book Image

Hands-On Java Deep Learning for Computer Vision

By : Klevis Ramo
Book Image

Hands-On Java Deep Learning for Computer Vision

By: Klevis Ramo

Overview of this book

Although machine learning is an exciting world to explore, you may feel confused by all of its theoretical aspects. As a Java developer, you will be used to telling the computer exactly what to do, instead of being shown how data is generated; this causes many developers to struggle to adapt to machine learning. The goal of this book is to walk you through the process of efficiently training machine learning and deep learning models for Computer Vision using the most up-to-date techniques. The book is designed to familiarize you with neural networks, enabling you to train them efficiently, customize existing state-of-the-art architectures, build real-world Java applications, and get great results in a short space of time. You will build real-world Computer Vision applications, ranging from a simple Java handwritten digit recognition model to real-time Java autonomous car driving systems and face recognition models. By the end of this book, you will have mastered the best practices and modern techniques needed to build advanced Computer Vision Java applications and achieve production-grade accuracy.
Table of Contents (8 chapters)

Building a handwritten digit recognizer

By building a handwritten digit recognizer in a Java application, we will practically implement most of the techniques and optimizations learned so far. The application is built using the open source Java framework, Deeplearning4j. The dataset used is the classic MNIST database of handwritten digits. (http://yann.lecun.com/exdb/mnist/). The training dataset is oversized, having 60,000 images, while the test data set contains 10,000 images. The images are 28 x 28 in size and grayscale in terms of terms.

As a part of the application that we will be creating in this section, we will implement a graphical user interface, where you can draw digits and get a neural network to recognize the digit.

Jumping straight into the code, let's observe how to implement a neural network in Java. We begin with the parameters; the first one is the output. Since we have 0 to 9 digits, we have 10 classes:

/**
* Number prediction classes.
* We have 0-9 digits so 10 classes in total.
*/
private static final int OUTPUT = 10;

We have the mini-batch size, which is the number of images we see before updating the weights or the number of images we'll process in parallel:

/**
* Mini batch gradient descent size or number of matrices processed in parallel.
* For CORE-I7 16 is good for GPU please change to 128 and up
*/
private static final int MINI_BATCH_SIZE = 16;// Number of training epochs
/**
* Number of total traverses through data.
* with 5 epochs we will have 5/@MINI_BATCH_SIZE iterations or weights updates
*/
private static final int EPOCHS = 5;

When we consider this for a CPU, the batch size of 16 is alright, but for GPUs, this needs to change according to the GPU power. One epoch is fulfilled when we traverse all the data.

The learning rate is quite important, because having a very low value will slow down the learning, and having bigger values of the learning rate will cause the neural network to actually diverge:

/**
* The alpha learning rate defining the size of step towards the minimum
*/
private static final double LEARNING_RATE = 0.01;

To understand this in detail, in the latter half of this section, we will simulate a case where we diverge by changing the learning rate. Fortunately, as part of this example, we need not handle the reading, transferring, or normalizing of the pixels from the MNIST dataset. We do not need to concern ourselves with transforming the data to a one-dimensional vector to fit to the neural network. This is because everything is encapsulated and offered by Deeplearning4j.

Under the object dataset iterator, we need to specify the batch size and whether we are going to use it for training or testing, which will help classify whether we need to load 60,000 images from the training dataset, or 10,000 from the testing dataset:

public void train() throws Exception {
/*
Create an iterator using the batch size for one iteration
*/
log.info("Load data....");
DataSetIterator mnistTrain = new MnistDataSetIterator(MINI_BATCH_SIZE, true, SEED);
/*
Construct the neural neural
*/
log.info("Build model....");

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(SEED)
.learningRate(LEARNING_RATE)
.weightInit(WeightInit.XAVIER)
//NESTEROVS is referring to gradient descent with momentum
.updater(Updater.NESTEROVS)
.list()

Let's get started with building a neural network. We've specified the learning rate, and initialized the weight according to Xavier, which we have learned in the previous sections. The updater in the code is actually just the optimization algorithm for updating the weights with a gradient descent. The NESTEROVS is basically the gradient descent with momentum that we're already familiar with.

Let's look into the code to understand the updater better. We look at the two formulas that are actually not different from what we have already explored.

We configure the input layer, hidden layers, and the output. Configuration of the input layer is quite easy; we just need to multiply the width and the weight and we have this one-dimensional vector size. The next step in the code is to define the hidden layers. We have two hidden layers, actually: one with 128 neurons and one with 64 neurons, both having an activation function because of its high efficiency.

Just to switch things up a bit, we could try out different values, especially those defined by the MNIST dataset web page. Despite that, the values chosen here are quite efficient, with less training time and good accuracy.

The output layer, which uses the softmax, because we need ten classes and not 2, we also have the cost function. The details for this may vary from what we have seen previously. This function measures the performance of the hypothetical values against the real values.

We then initialize and define the function, as we want to see the cost function for every 100 iterations. The model.fit (minstTrain) is very important, because this actually works iteration by iteration, as defined by many, it traverses all the data. After this, we have executed one epoch and the neural network has learned to use good weights.

Testing the performance of the neural network

To test the accuracy of the network, we construct another dataset for testing. We evaluate this model with what we've learned so far and print the statistics. If the accuracy of the network is more than 97%, we stop there and save the model to use for the graphical user interface that we will study later on. Execute the following code:

if (mnistTest == null) {
mnistTest = new MnistDataSetIterator(MINI_BATCH_SIZE, false, SEED);
}

The cost function is being printed and if you observe it closely, it gradually decreases through the iterations. From time to time, we have a peak in the value of the cost function. This is a characteristic of the mini-batch gradient descent. The final output of the first epoch shows us that the model has 96% accuracy just for one epoch, which is great. This means the neural network is learning fast.

In most cases, it does not work like this and we need to tune our network for a long time before we obtain the output we want. Let's look at the output of the second epoch:

We obtain an accuracy of more than 97% in just two epochs.

Another aspect that we need to draw our attention to is how a simple model is achieving really great results. This is a part of the reason why deep learning is taking off. It is easy to obtain good results, and it is easy to work with.

As mentioned before, let's look at a case of disconverging by increasing the learning rate to 0.6:

private static final double LEARNING_RATE = 0.01;

/**
* https://en.wikipedia.org/wiki/Random_seed
*/
private static final int SEED = 123;
private static final int IMAGE_WIDTH = 28;
private static final int IMAGE_HEIGHT = 28;

If we now run the network, we will observe that the cost function will continue to increase with no signs of decreasing. The accuracy is also affected greatly. The cost function for one epoch almost stays the same, despite having 3,000 iterations. The final accuracy of the model is approximately 10%, which is a clear indication that this is not the best method.

Let's run the application for various digits and see how it works. Let's begin with the number 3:

The output is accurate. Run this for any of the numbers that lie between Zero to nine and check whether your model is working accurately.

Also, keep in mind that the model is not perfect yet-we shall improve this with CNN architectures in the next chapter. They offer state-of-the-art techniques and high accuracy, and we should be able to achieve an accuracy of 99%.