Hands-On Java Deep Learning for Computer Vision

By : Klevis Ramo

Hands-On Java Deep Learning for Computer Vision

By: Klevis Ramo

Overview of this book

Although machine learning is an exciting world to explore, you may feel confused by all of its theoretical aspects. As a Java developer, you will be used to telling the computer exactly what to do, instead of being shown how data is generated; this causes many developers to struggle to adapt to machine learning. The goal of this book is to walk you through the process of efficiently training machine learning and deep learning models for Computer Vision using the most up-to-date techniques. The book is designed to familiarize you with neural networks, enabling you to train them efficiently, customize existing state-of-the-art architectures, build real-world Java applications, and get great results in a short space of time. You will build real-world Computer Vision applications, ranging from a simple Java handwritten digit recognition model to real-time Java autonomous car driving systems and face recognition models. By the end of this book, you will have mastered the best practices and modern techniques needed to build advanced Computer Vision Java applications and achieve production-grade accuracy.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Introduction to Computer Vision and Training Neural Networks

The computer vision state

Exploring neural networks

How does a neural network learn?

Organizing data and applications

Effective training techniques

Optimizing algorithms

Configuring the training parameters of the neural network

Representing images and outputs

Building a handwritten digit recognizer

Summary

Convolutional Neural Network Architectures

Understanding edge detection

Building a Java edge detection application

Convolution on RGB images

Working with convolutional layers' parameters

Pooling layers

Building and training a Convolution Neural Network

Improving the handwritten digit recognition application

Summary

Transfer Learning and Deep CNN Architectures

Working with classical networks

Using residual networks for image recognition

The power of 1 x 1 convolutions and the inception network

Applying transfer learning

Neural networks

Building an animal image classification – using transfer learning and VGG-16 architecture

Summary

Real-Time Object Detection

Resolving object localization

Object detection with the sliding window solution

Convolutional sliding window

Detecting objects with the YOLO algorithm

Max suppression and anchor boxes

Building a real-time video, car, and pedestrian detection application

Summary

Creating Art with Neural Style Transfer

What are convolution network layers learning?

Neural style transfer

Applying content cost function

Applying style cost function

Building a neural network that produces art

Summary

Face Recognition

Problems in face detection

Differentiating inputs with Siamese networks

Exploring triplet loss

Binary classification

Building a face recognition Java application

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Organizing data and applications

In this section, we will look at the old and new techniques for organizing data. We will also gain some intuition as to what results our model may produce until it is ready for production. By the end of this section, we will have explored how neural networks are implemented to obtain a really high performance rate.

Organizing your data

Just like any other network, a neural network depends on data. Previously, we used datasets containing 1,000 to 100,000 rows of data. Even in cases where more data was added, the low computational power of the systems would not allow us to organize this kind of data efficiently.

We always begin with training our network, which implies that we in fact need a training dataset that should consist of 60% of the total data in the dataset. This is a very important step, as here is where the neural network learns the values of the weights present in the dataset. The second phase is to see how well the network does with data that it has never seen before, which consists of 20% of the data in the dataset. This dataset is known as a cross-validation dataset. The aim of this phase is to see how the the network generalizes data that it was not trained for.

Based on the performance in this phase, we can vary the parameters to attain the best possible output. This phase will continue until we have achieved optimum performance. The remaining data in the dataset can now be used as the test dataset. The reason for having this dataset is to have a completely unbiased evaluation of the network. This is basically to understand the behavior of the network toward data that it has not seen before and not been optimized for.

If we were to have a visual representation of the organization of data as described previously, here is what it would look like as in the following diagram:

This configuration was well known and widely used until recent years. Multiple variations to the percentages of data allocated to each dataset also existed. In the more recent era of deep learning, two things have changed substantially:

Millions of rows of data are present in the datasets that we are currently using.
The computational power of our processing systems has increased drastically because of advanced GPUs.

Due to these reasons, neural networks now have a deeper, bigger, and more complex architecture. Here is how our data will be organized:

The training dataset has increased immensely; observe that 96% of the data will be used as a dataset, while 2% will be required for the development dataset and the remaining 2% for the testing dataset. In fact, it is even possible to have 99% of the data used to train the model and the remaining 1% of data to be divided between the training and the development datasets. At some points, it's OK to not have a test dataset at all. The only time we need to have a test dataset is when we need to have a completely unbiased evaluation. Through the course of this chapter, we shall hardly use the test dataset.

Notice how the cross-validation dataset becomes the development dataset. The functionality of the dataset does not vary.

Bias and variance

During the training of the neural network, the model may undergo various symptoms. One of them is high bias value. This leads to a high error rate on our training dataset and therefore, consecutively, a similar error on the development dataset. What this tells us is that our network has not learned how to solve the problem or to find the pattern. Graphically, we can represent the model like this:

This graph depicts senior boundaries and the errors caused by the model, where it marks the red dots as green squares and vice versa.

We may also have to worry about the high variance problem. Assume that the neural network does a great job during the training phase and figures out a really complex decision bundle, almost perfectly. But, when the same model uses the development testing dataset, it performs poorly, having a high error rate and an output that is not different from the high bias error graph:

If we look at the previous graph, it looks like a neural network really learned to be specific to the training dataset, and when it encounters examples it hasn't seen before, it doesn't know how to categorize our data.

The final unfortunate case is where our neural network may have both of these two symptoms together. This is a case where we see a high error rate on the training dataset and a double or higher error rate for the development dataset. This can be depicted graphically as follows:

The first line is the decision boundary from the training dataset, and the second line is the decision boundary for the development dataset, which is even worse.

Computational model efficiency

Neural networks are currently learning millions of weights. Millions of weights mean millions of multiplications. This makes it essential to find a highly efficient model to do this multiplication, and that is done by using matrices. The following diagram depicts how weights are placed in a matrix:

The weight matrix here has one row and four columns, and the inputs are in another matrix. These inputs can be the outputs of the previous hidden layer.

To find the output, we need to simply perform a simple multiplication of these two matrices. This means that is the multiplication of the row and the column.

To make it more complex, let us vary our neural network to have one more hidden layer.

Having a new hidden layer will change our matrix as well. All the weights from the hidden layer 2 will be added as a second row to the matrix. The value is the multiplication of the second row of the matrix with the column containing the input values:

Notice now how and can be actually calculated in parallel, because they don't have any dependents, so really, the multiplication of the first row with the inputs column is not dependent on the multiplication of the second row with the inputs column.

To make this more complex, we can have another set of examples that will affect the matrix as follows:

We now have four sets and we can actually calculate each of them in parallel. Consider , which is the result of the multiplication of the first row with the first input column, while this is the multiplication of the second row of weights with the second column of the input.

In standard computers, we currently have 16 of these operations carried out in parallel. But the biggest gain here is when we use GPUs, because GPUs enable us to execute from 100 to 1,000 of these operations in parallel. One of the reasons that deep learning has been taking off recently is because of GPUs offering really great computational power.

Hands-On Java Deep Learning for Computer Vision

By : Klevis Ramo

Hands-On Java Deep Learning for Computer Vision

By: Klevis Ramo

Overview of this book

Related Content you might be interested in

Current Title:

Hands-On Java Deep Learning for Computer Vision

Hands-On Mathematics for Deep Learning

Java Deep Learning Projects

Practical Convolutional Neural Networks