Book Image

Deep Learning Essentials

By : Wei Di, Jianing Wei, Anurag Bhardwaj
3 (1)
Book Image

Deep Learning Essentials

3 (1)
By: Wei Di, Jianing Wei, Anurag Bhardwaj

Overview of this book

Deep Learning a trending topic in the field of Artificial Intelligence today and can be considered to be an advanced form of machine learning. This book will help you take your first steps in training efficient deep learning models and applying them in various practical scenarios. You will model, train, and deploy different kinds of neural networks such as CNN, RNN, and will see some of their applications in real-world domains including computer vision, natural language processing, speech recognition, and so on. You will build practical projects such as chatbots, implement reinforcement learning to build smart games, and develop expert systems for image captioning and processing using Python library such as TensorFlow. This book also covers solutions for different problems you might come across while training models, such as noisy datasets, and small datasets. By the end of this book, you will have a firm understanding of the basics of deep learning and neural network modeling, along with their practical applications.
Table of Contents (12 chapters)

Why deep learning?

So far we discussed what is deep learning and the history of deep learning. But why is it so popular now? In this section, we talk about advantages of deep learning over traditional shallow methods and its significant impact in a couple of technical fields.

Advantages over traditional shallow methods

Traditional approaches are often considered shallow machine learning and often require the developer to have some prior knowledge regarding the specific features of input that might be helpful, or how to design effective features. Also, shallow learning often uses only one hidden layer, for example, a single layer feed-forward network. In contrast, deep learning is known as representation learning, which has been shown to perform better at extracting non-local and global relationships or structures in the data. One can supply fairly raw formats of data into the learning system, for example, raw image and text, rather than extracted features on top of images (for example, SIFT by David Lowe's Object Recognition from Local Scale-Invariant Features and HOG by Dalal and their co-authors, Histograms of oriented gradients for human detection), or IF-IDF vectors for text. Because of the depth of the architecture, the learned representations form a hierarchical structure with knowledge learned at various levels. This parameterized, multi-level, computational graph provides a high degree of the representation. The emphasis on shallow and deep algorithms are significantly different in that shallow algorithms are more about feature engineering and selection, while deep learning puts its emphasis on defining the most useful computational graph topology (architecture) and the ways of optimizing parameters/hyperparameters efficiently and correctly for good generalization ability of the learned representations:

Comparing deep and shallow architecture. It can be seen deep architecture has many layers of hierarchical topology

Deep learning algorithms are shown to perform better at extracting non-local and global relationships and patterns in the data, compared to relatively shallow learning architectures. Other useful characteristics of learnt abstract representations by deep learning include:

  • It tries to explore most of the abundant huge volume of the dataset, even when the data is unlabeled.
  • The advantage of continuing to improve as more training data is added.
  • Automatic data representation extraction, from unsupervised data or supervised data, distributed and hierarchical, usually best when input space is locally structured; spatial or temporal—for example, images, language, and speech.
  • Representation extraction from unsupervised data enables its broad application to different data types, such as image, textural, audio, and so on.
  • Relatively simple linear models can work effectively with the knowledge obtained from the more complex and more abstract data representations. This means with the advanced feature extracted, the following learning model can be relatively simple, which may help reduce computational complexity, for example, in the case of linear modeling.
  • Relational and semantic knowledge can be obtained at higher levels of abstraction and representation of the raw data (Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, 2007, source: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-014-0007-7).
  • Deep architectures can be representationally efficient. This sounds contradictory, but its a great benefit because of the distributed representation power by deep learning.
  • The learning capacity of deep learning algorithms is proportional to the size of data, that is, performance increases as the input data increases, whereas, for shallow or traditional learning algorithms, the performance reaches a plateau after a certain amount of data is provided as shown in the following figure, Learning capability of deep learning versus traditional machine learning:
Learning capability of deep learning versus traditional machine learning

Impact of deep learning

To show you some of the impacts of deep learning, let’s take a look at two specific areas: image recognition and speed recognition.

The following figure, Performance on ImageNet classification over time, shows the top five error rate trends for ILSVRC contest winners over the past several years. Traditional image recognition approaches employ hand-crafted computer vision classifiers trained on a number of instances of each object class, for example, SIFT + Fisher vector. In 2012, deep learning entered this competition. Alex Krizhevsky and Professor Hinton from Toronto university stunned the field with around 10% drop in the error rate by their deep convolutional neural network (AlexNet). Since then, the leaderboard has been occupied by this type of method and its variations. By 2015, the error rate had dropped below human testers:

Performance on ImageNet classification over time

The following figure, Speech recognition progress depicts recent progress in the area of speech recognition. From 2000-2009, there was very little progress. Since 2009, the involvement of deep learning, large datasets, and fast computing has significantly boosted development. In 2016, a major breakthrough was made by a team of researchers and engineers in Microsoft Research AI (MSR AI). They reported a speech recognition system that made the same or fewer errors than professional transcriptionists, with a word error rate (WER) of 5.9%. In other words, the technology could recognize words in a conversation as well as a person does:

Speech recognition progress

A natural question to ask is, what are the advantages of deep learning over traditional approaches? Topology defines functionality. But why do we need expensive deep architecture? Is this really necessary? What are we trying to achieve here? It turns out that there are both theoretical and empirical pieces of evidence in favor of multiple levels of representation. In the next section, let’s dive into more details about the deep architecture of deep learning.