Book Image

Hands-On Java Deep Learning for Computer Vision

By : Klevis Ramo
Book Image

Hands-On Java Deep Learning for Computer Vision

By: Klevis Ramo

Overview of this book

Although machine learning is an exciting world to explore, you may feel confused by all of its theoretical aspects. As a Java developer, you will be used to telling the computer exactly what to do, instead of being shown how data is generated; this causes many developers to struggle to adapt to machine learning. The goal of this book is to walk you through the process of efficiently training machine learning and deep learning models for Computer Vision using the most up-to-date techniques. The book is designed to familiarize you with neural networks, enabling you to train them efficiently, customize existing state-of-the-art architectures, build real-world Java applications, and get great results in a short space of time. You will build real-world Computer Vision applications, ranging from a simple Java handwritten digit recognition model to real-time Java autonomous car driving systems and face recognition models. By the end of this book, you will have mastered the best practices and modern techniques needed to build advanced Computer Vision Java applications and achieve production-grade accuracy.
Table of Contents (8 chapters)

Exploring neural networks

In this section, we will learn how artificial neural networks and neurons are connected together. We will build a neural network and get familiar with its computational representation.

Neural networks were first inspired by biological neurons. When we try to analyze the similarities between an artificial network and a neuron, we realize there isn't much in common. The harsh truth here is that we don't even know what a single neuron does and there are still knowledge gaps regarding how connected neurons learn together so efficiently. But if we were to draw conclusions, we could say that all neurons have the same basic structure, which consists of two major regions:

  • The region for receiving and processing incoming information from other cells. This involves the dendrites, which receives the input information, and the nucleus, which processes or transforms the information.
  • The region that conducts and transmits information to other cells. The axon, or the axon terminals, forward this information to many other cells or neurons.

Building a single neuron

Let's understand how to implement a neural network on a computer by expressing a single neuron mathematically, as follows:

The inputs here are numbers, followed by the computational units. We are familiar with the fact that we do not know the functioning of a biological neuron, but while creating an artificial network, we actually possess the power to build a process.

Let us build a computational unit that will process the data in two steps as depicted in the previous diagram. The first step will sum all the input values obtained so far, and for the second step, we will apply the sum attained in the previous step to a sigmoid function as depicted in the preceding diagram.

The purpose of the sigmoid function is to provide the output as 1 when the sum applied is positive, and to give the output as 0 when the sum applied is negative. In this example, the sum of X1, X2, X3, and X4 will be -3, which, when applied to the sigmoid function, will give us the final value of 0.1.

The sigmoid function, which is applied after the sum, is called the activation function, and is denoted by a.

Building a single neuron with multiple outputs

As stated previously, a biological neuron provides the outputs to multiple cells. If we continue to use the example in the previous section, our neuron should forward the attained value of 0.1 to multiple cells. For this sake of this situation, let's assume that there are three neurons.

If we provide the same output of 0.1 to all the neurons, they will all give us the same output, which isn't really useful. The question that now begs an answer is why we need to provide this to three or multiple neurons, when we could do it with only one?

To make this computationally useful, we apply some weights, where each weight will have a different value. We multiply the activation function with these weights to gain different values for each neuron. Look at the example depicted in the following diagram:

Here, we can clearly see that we assign the values =2, =-1, and =3 to the three weights and obtain the outputs =0.2, =-0.1, and =0.3. We can actually connect these different values to three neurons and the output achieved will be different.

Building a neural network

So now that we have the structure for one neuron, it's time to build a neural network. A neural network, just like a neuron, has three parts:

  • The input layer
  • The output layer
  • The hidden layers

The following diagram should help you visualize the structure better:

Usually, we have many hidden layers with hundreds and thousands of functions, but here, we have just two hidden layers: one with one neuron and the second with three neurons.

The first layer will give us one output that is achieved after multiplying by the activation function. By applying different values of weights to this, we can produce three different output values and connect them to three new rows, each of which will be multiplied by an activation function. Lastly, sum up these values and apply it to a sigmoid function to obtain the final output. You could add more hidden layers to this as well.

The indexes assigned to each weight in the diagram are decided based on the starting neuron of the first hidden layer and the neuron of the second hidden layer. Thus, the indexes for the weights in the first first hidden later are , , and .

The indexes for the Z value are also assigned in a similar manner. The first index represents the neuron that requires the weight, and the second index of Z represents the hidden layer that the Z value belongs to.

Similarly, we may want the input layer to be connected to different neurons, and we can do that simply by multiplying the input values by weights. The following diagram depicts an additional neuron in hidden layer 1:

Notice how now we added a bunch of other Zs, which are simply the contribution of this neuron. The second index for this will be 2, because it comes from the second neuron.

The last thing in this section is trying to make a clear distinction between the weights and the Z values that have the same indexes, but actually belong to different hidden layers. We can apply a superscript, as shown in the following diagram:

This implies that all the weights and Z values are contributing to a heightened level 1. To further distinguish, we can have 2 added to layer 2, making a clear distinction between the weight in layer 1 and and this weight in layer 2. These contribute to the heightened layer 2, and we can add 3 to the weights for the output layer because those contribute to the heightened output layer 3. The following diagram depicts all the heightened layers:

In general, we will mention the superscript index only if it is necessary, because it makes the network messy.