Book Image

Hands-On Java Deep Learning for Computer Vision

By : Klevis Ramo
Book Image

Hands-On Java Deep Learning for Computer Vision

By: Klevis Ramo

Overview of this book

Although machine learning is an exciting world to explore, you may feel confused by all of its theoretical aspects. As a Java developer, you will be used to telling the computer exactly what to do, instead of being shown how data is generated; this causes many developers to struggle to adapt to machine learning. The goal of this book is to walk you through the process of efficiently training machine learning and deep learning models for Computer Vision using the most up-to-date techniques. The book is designed to familiarize you with neural networks, enabling you to train them efficiently, customize existing state-of-the-art architectures, build real-world Java applications, and get great results in a short space of time. You will build real-world Computer Vision applications, ranging from a simple Java handwritten digit recognition model to real-time Java autonomous car driving systems and face recognition models. By the end of this book, you will have mastered the best practices and modern techniques needed to build advanced Computer Vision Java applications and achieve production-grade accuracy.
Table of Contents (8 chapters)

How does a neural network learn?

In this section, we will understand how a simple model predicts and how it learns from data. We will then move on to deep networks, which will give us some insight on why they are better and more efficient compared to other networks.

Assume we are given a task to predict whether a person could have heart disease in the near future. We have a considerable amount of data about the history of the individual and whether they got heart disease later on or not.

The parameters that will be taken into consideration are age, height, weight, genetic factors, whether the patient is a smoker or not, and their lifestyle. Let us begin by building a simple model:

All the information we have for the individual we will use as input, and call them features. As we learned in the previous section, our next step is to multiply the features by the weights, and then take the sum of these products and apply it as an input to a sigmoid function, or the activation function. The sigmoid function will output 1 or 0, depending on whether the sum is positive or negative:

In this case, the activation value produced by the activation function is also the output, since we don't have any hidden layers. We interpret the output value 1 to mean that the person will not have any heart disease, and 0 as the person will have heart disease in the near future.

Let's use a comparative example with three individuals to check whether this model functions appropriately:

As we can see in the preceding diagram, here are the input values for person 1:

  • Age = 60 years old
  • Height = 180 centimeters
  • Weight = 75 kilograms
  • Number of people in their family affected by a heart disease = 3
  • Non-smoker
  • Has a good lifestyle

The input values for person 2 are as follows:

  • Age = 50 years old
  • Height = 170 centimeters
  • Weight = 120 kilograms
  • Number of people in their family affected by a heart disease = 7
  • Smoker
  • Has a sedentary lifestyle

The input values for person 3 are as follows:

  • Age = 40 years old
  • Height = 175 centimeters
  • Weight = 85 kilograms
  • Number of people in their family affected by a heart disease = 4
  • Light smoker
  • Has a very good and clean lifestyle

So if we had to come up with some probability for each of them having a heart disease, then we may come up with something like this:

So, for person 1, there is just a 20% chance of heart disease because of his good family history and the fact that they're not smoking and has a good lifestyle. For person 2, it's obvious that the chances of being affected by heart disease are much higher because of their family history, heavy smoking, and their really bad lifestyle. For person 3, we are not quite sure, which is why we give it a 50/50; since the person may smoke slightly, but also has a really good lifestyle, and their family history is not that bad. We also factor in that this individual is quite young.

So if we were to ponder about how we as humans learned to predict this probability, we'd figure out the impact of each of the features on the person's overall health. Lifestyle has a positive impact on the overall output, while genetics and family history have a very negative impact, weight has a negative impact, and so on.

It just so happens that neural networks also learn in a similar manner, the only difference being that they predict the outcome by figuring out the weights. When it comes to lifestyle, a neural network having a large weight for lifestyle will help reinforce the positive value of lifestyle in the equation. For genetics and family history, however, the neural network will assign a much smaller or negative value to contribute the negative factor to the equation. In reality, neural networks are busy figuring out a lot of weights.

Now let's see how neural networks actually learn the weights.

Learning neural network weights

To understand this section, let us assume that the person in question will eventually and indefinitely be affected by a heart disease, which directly implies that the output of our sigmoid function is 0.

We begin by assigning some random non-zero values to the weights in the equation, as shown in the following diagram:

We do this because we do not really know what the initial value of the weights should be.

We now do what we have learned in the previous section: we move in the forward direction of our network, which is from the input layer to the output layer. We multiply the features with the weights and sum them up before applying them to the sigmoid function. Here is what we obtain as the final output:

The output obtained is 4109, which, when applied to the activation function, gives us the final output of 1, which is the complete opposite of the actual answer that we were looking for.

What do we do to improve the situation? The answer to this question is a backward pass, which means we move through our model from the output layer to the input layer so that during the next forward pass, we can obtain much better results.

To counter this, the neural network will try to vary the values of the weights, as depicted in the following diagram:

It lowers the weight of the age parameter just to make the age add negatively to the equation. Also, it slightly increases the lifestyle because this contributes positively, and for the genes and weights, it applies negative weights.

We do another forward pass, and this time we have a smaller value of 275, but we're still going to achieve an output one from the sigmoid function:

We do a backward pass again and this time we may have to vary the weights even further:

The next time we do a forward pass, the equation produces a negative value, and if we apply this to a sigmoid function, we have a final output of zero:

Comparing 0 to the required value, we realize it's time to stop because the network now knows how to predict.

A forward pass and a backward pass together is called one iteration. In reality, we have 1,000, 100,000, or even millions of these examples, and before we change the weight, we take into account the contribution of each of these examples. Basically, we sum up the contribution of each of these examples, and then change the weights.

Updating the neural network weights

The sum of the product of the features and weights is given to the sigmoid or activation function. This is called the hypothesis. We begin with theories on what the output will look like, and then see how wrong we are when the results turn out to be different to what we actually require.

To realize how inaccurate our theories are, we require a loss, or cost, function:

The loss or cost function is the difference between the hypothesis and the real value that we know from the data. We need to add the sum function to make sure that the model accounts for all the examples and not only 1. The reason we square the value is so that we can maintain a positive value and exaggerate the difference between the true data and the error, such that the neural network will work harder to maintain as low an error rate as possible.

The plot for the cost function is as follows:

The first hypothesis is marked on the plot. We want the hypothesis that produces a cost value at the zero point because we want the hypothesis to be equal to reality, and they are equal, as we can see from the previous equation. This means that the difference is zero. But, as we saw at the beginning, we start really far away from this value.

Now we need to act on the cost function value to check the accuracy and performance of the hypothesis. In order to understand the direction in which we need to move, we need to calculate the derivative of the cost function by each of the weights. Graphically, that is interpreted as the plot on the following graph, which is tagged with the current cost value:

We subtract the derivation value from the actual weights. This is mathematically given as follows:

,

,

,

And so on...

We keep subtracting these values, iteration by iteration, just doing forward and backward passes, and keep moving closer to the zero point:

Notice the alpha here, or the learning rate. The learning rate actually defines how big the step is. If we have smaller values then the step is really small and it takes longer to get the desired value, which slows down the neural network learning, while having bigger values may actually cause our model to never get to the desired point. The alpha learning rate has to be just right.

As a sanity check, we can monitor the cost function so that it will increase iteration by iteration, and it should decrease in the long term.

Advantages of deep learning

If we consider a simple model, here is what our network would look as follows:

This just means that a simple model learns in one big step. This may work fine for simple tasks, but for a highly complex tasks such as computer vision or image recognition, this is not enough. Complex tasks require a lot of manual engineering to achieve good precision. To do this, we add a lot of other layers of neurons that enable the network to learn step by step, instead of taking one huge leap to the output. The network should look as follows:

The first layer may learn low-level features such as horizontal lines, vertical lines, or diagonal lines, then it passes this knowledge to the second layer, which learns to detect shapes, then the third layer learns color and shapes, and detects more complex things such as faces and so on. By the fourth and the fifth layer, we may be able to detect really high-level features such as humans, cars, trees, or animals.

Another advantage of deep networks is to picture the output as the function of the input. In a simple model, we have the output that is the indirect function of the input. Here, we can see the output actually is the function of the fifth-layer weights. Then the fifth-layer weights are a function of the fourth layer, and the fourth layer is a function of the third layer, and so on. In this way, we actually learn really highly complex functions compared to a simple model.

So that's it for this section. The next section will be about organizing your data and applications, and at the same time, we will look at a highly efficient computational model for neural networks.