Book Image

Java Deep Learning Projects

Book Image

Java Deep Learning Projects

Overview of this book

Java is one of the most widely used programming languages. With the rise of deep learning, it has become a popular choice of tool among data scientists and machine learning experts. Java Deep Learning Projects starts with an overview of deep learning concepts and then delves into advanced projects. You will see how to build several projects using different deep neural network architectures such as multilayer perceptrons, Deep Belief Networks, CNN, LSTM, and Factorization Machines. You will get acquainted with popular deep and machine learning libraries for Java such as Deeplearning4j, Spark ML, and RankSys and you’ll be able to use their features to build and deploy projects on distributed computing environments. You will then explore advanced domains such as transfer learning and deep reinforcement learning using the Java ecosystem, covering various real-world domains such as healthcare, NLP, image classification, and multimedia analytics with an easy-to-follow approach. Expert reviews and tips will follow every project to give you insights and hacks. By the end of this book, you will have stepped up your expertise when it comes to deep learning in Java, taking it beyond theory and be able to build your own advanced deep learning systems.
Table of Contents (13 chapters)

Delving into deep learning

Simple ML methods that were used in normal-size data analysis are not effective anymore and should be substituted by more robust ML methods. Although classical ML techniques allow researchers to identify groups or clusters of related variables, the accuracy and effectiveness of these methods diminish with large and high-dimensional datasets.

Here comes deep learning, which is one of the most important developments in artificial intelligence in the last few years. Deep learning is a branch of ML based on a set of algorithms that attempt to model high-level abstractions in data.

How did DL take ML into next level?

In short, deep learning algorithms are mostly a set of ANNs that can make better representations of large-scale datasets, in order to build models that learn these representations very extensively. Nowadays it's not limited to ANNs, but there have been really many theoretical advances and software and hardware improvements that were necessary for us to get to this day. In this regard, Ian Goodfellow et al. (Deep Learning, MIT Press, 2016) defined deep learning as follows:

"Deep learning is a particular kind of machine learning that achieves great power and flexibility by learning to represent the world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more abstract representations computed in terms of less abstract ones."

Let's take an example; suppose we want to develop a predictive analytics model, such as an animal recognizer, where our system has to resolve two problems:

  • To classify whether an image represents a cat or a dog
  • To cluster images of dogs and cats.

If we solve the first problem using a typical ML method, we must define the facial features (ears, eyes, whiskers, and so on) and write a method to identify which features (typically nonlinear) are more important when classifying a particular animal.

However, at the same time, we cannot address the second problem because classical ML algorithms for clustering images (such as k-means) cannot handle nonlinear features. Deep learning algorithms will take these two problems one step further and the most important features will be extracted automatically after determining which features are the most important for classification or clustering.

In contrast, when using a classical ML algorithm, we would have to provide the features manually. In summary, the deep learning workflow would be as follows:

  • A deep learning algorithm would first identify the edges that are most relevant when clustering cats or dogs. It would then try to find various combinations of shapes and edges hierarchically. This step is called ETL.
  • After several iterations, hierarchical identification of complex concepts and features is carried out. Then, based on the identified features, the DL algorithm automatically decides which of these features are most significant (statistically) to classify the animal. This step is feature extraction.
  • Finally, it takes out the label column and performs unsupervised training using AutoEncoders (AEs) to extract the latent features to be redistributed to k-means for clustering.
  • Then the clustering assignment hardening loss (CAH loss) and reconstruction loss are jointly optimized towards optimal clustering assignment. Deep Embedding Clustering (see more at https://arxiv.org/pdf/1511.06335.pdf) is an example of such an approach. We will discuss deep learning-based clustering approaches in Chapter 11, Discussion, Current Trends, and Outlook.

Up to this point, we have seen that deep learning systems are able to recognize what an image represents. A computer does not see an image as we see it because it only knows the position of each pixel and its color. Using deep learning techniques, the image is divided into various layers of analysis.

At a lower level, the software analyzes, for example, a grid of a few pixels with the task of detecting a type of color or various nuances. If it finds something, it informs the next level, which at this point checks whether or not that given color belongs to a larger form, such as a line. The process continues to the upper levels until you understand what is shown in the image. The following diagram shows what we have discussed in the case of an image classification system:

A deep learning system at work on a dog versus cat classification problem

More precisely, the preceding image classifier can be built layer by layer, as follows:

  • Layer 1: The algorithm starts identifying the dark and light pixels from the raw images
  • Layer 2: The algorithm then identifies edges and shapes
  • Layer 3: It then learns more complex shapes and objects
  • Layer 4: The algorithm then learns which objects define a human face

Although this is a very simple classifier, software capable of doing these types of things is now widespread and is found in systems for recognizing faces, or in those for searching by an image on Google, for example. These pieces of software are based on deep learning algorithms.

On the contrary, by using a linear ML algorithm, we cannot build such applications since these algorithms are incapable of handling nonlinear image features. Also, using ML approaches, we typically handle a few hyperparameters only. However, when neural networks are brought to the party, things become too complex. In each layer, there are millions or even billions of hyperparameters to tune, so much that the cost function becomes non-convex.

Another reason is that activation functions used in hidden layers are nonlinear, so the cost is non-convex. We will discuss this phenomenon in more detail in later chapters but let's take a quick look at ANNs.