Book Image

Mobile Deep Learning with TensorFlow Lite, ML Kit and Flutter

By : Anubhav Singh, Rimjhim Bhadani
Book Image

Mobile Deep Learning with TensorFlow Lite, ML Kit and Flutter

By: Anubhav Singh, Rimjhim Bhadani

Overview of this book

Deep learning is rapidly becoming the most popular topic in the mobile app industry. This book introduces trending deep learning concepts and their use cases with an industrial and application-focused approach. You will cover a range of projects covering tasks such as mobile vision, facial recognition, smart artificial intelligence assistant, augmented reality, and more. With the help of eight projects, you will learn how to integrate deep learning processes into mobile platforms, iOS, and Android. This will help you to transform deep learning features into robust mobile apps efficiently. You’ll get hands-on experience of selecting the right deep learning architectures and optimizing mobile deep learning models while following an application oriented-approach to deep learning on native mobile apps. We will later cover various pre-trained and custom-built deep learning model-based APIs such as machine learning (ML) Kit through Firebase. Further on, the book will take you through examples of creating custom deep learning models with TensorFlow Lite. Each project will demonstrate how to integrate deep learning libraries into your mobile apps, right from preparing the model through to deployment. By the end of this book, you’ll have mastered the skills to build and deploy deep learning mobile applications on both iOS and Android.
Table of Contents (13 chapters)

Introducing some common deep learning architectures

After a brief revision of the key terms, we are now ready to dive deeper into the world of deep learning. In this section, we will be learning about some famous deep learning algorithms and how they work.

Convolutional neural networks

Inspired from the animal visual cortex, a convolutional neural network (CNN) is primarily used for, and is the de facto standard for, image processing. The core concept of the convolutional layer is the presence of kernels (or filters) that learn to differentiate between the features of an image. A kernel is usually a much shorter matrix than the image matrix and is passed over the entire image in a sliding-window fashion, producing a dot product of the kernel with the corresponding slice of matrix from the image to be processed. The dot product allows the program to identify the features in the image. 

Consider the following image vector:

[[10, 10, 10, 0, 0, 0],
[10, 10, 10, 0, 0, 0],
[10, 10, 10, 0, 0, 0],
[0, 0, 0, 10, 10, 10],
[0, 0, 0, 10, 10, 10],
[0, 0, 0, 10, 10, 10]]

The preceding matrix corresponds to an image that looks like this:

On applying a filter to detect horizontal edges, the filter is defined by the following matrix: 

[[1, 1, 1],
[0, 0, 0],
[-1, -1, -1]]

The output matrix produced after the convolution of the original image with the filter is as follows:

[[ 0, 0, 0, 0],
[ 30, 10, -10, -30],
[ 30, 10, -10, -30],
[ 0, 0, 0, 0]]

There are no edges detected in the upper half or lower half of the image. On moving toward the vertical middle of the image from the left edge, a clear horizontal edge is found. On moving further right, two unclear instances of a horizontal edge are found before another clear instance of a horizontal edge. However, the clear horizontal edge found now is in the opposite color as the previous one.

Thus, by simple convolutions, it is possible to uncover patterns in the image files. CNNs also use several other concepts, such as pooling.

It is possible to understand pooling from the following screenshot:

In the simplest terms, pooling is the method of consolidating several image pixels into a single pixel. The pooling method used in the preceding screenshot is known as max pooling, wherein only the largest value from the selected sliding-window kernel is kept in the resultant matrix. This greatly simplifies the image and helps to train filters that are generic and not exclusive to a single image.

Generative adversarial networks

Generative adversarial networks (GANs) are a fairly new concept in the field of AI and have come as a major breakthrough in recent times. They were introduced by Ian Goodfellow in his research paper, in 2014. The core idea behind a GAN is the parallel run of two neural networks that compete against each other. The first neural network performs the task of generating samples and is called the generator. The other neural network tries to classify the sample based on the data previously provided and is called the discriminator. The functioning of GANs can be understood with the following screenshot: 

Here, the random image vector undergoes a generative process to produce fake images that are then classified by the discriminator that has been trained with the real images. The fake images with higher classification confidence are further used for generation, while the ones with lower confidence are discarded. Over time, the discriminator learns to correctly recognize fake images, while the generator learns to produce images that resemble the real images increasingly after each generation. 

What we have at the end of the learning is a system that can produce near-real data, and also a system that can classify samples with very high precision. 

We will learn more about GANs in the upcoming chapters.

For an in-depth study of GANs, you can read the research paper by Ian Goodfellow at

Recurrent neural networks

Not all data in the world exists independently of time. Stock market prices and spoken/written words are just a few examples of data that is bound to a time series. Therefore, the sequence of data has a temporal dimension, and you might assume that being able to use it in the manner befitting to data, which comes with the passage of time instead of a chunk of data that remains constant, would be more intuitive and would produce better prediction accuracy. In many cases, this has been found to be true and has led to the emergence of neural network architectures that can take time as a factor while learning and predicting. 

One such architecture is the recurrent neural network (RNN). The major characteristic of such a network is that it not only passes data from one layer to another in a sequential manner, but it also takes data from any previous layer. Recall from the Understanding machine learning and deep learning section the diagram of a simple artificial neural network (ANN) with two hidden layers. The data was being fed into the next layer by the previous layer only. In an RNN with, say, two hidden layers, it is not mandatory for the input to the second hidden layer be provided only by the first hidden layer, as would be the case in a simple ANN.

This is depicted by the dashed arrows in the following screenshot:

RNNs, in contrast to simple ANNs, use a method called backpropagation through time (BPTT) instead of the classic backpropagation in ANNs. BPTT ensures that time is well represented in the backward propagation of the error by defining it in functions relating to the input that has to recur in the network.

Long short-term memory

It is very common to observe vanishing and exploding gradients in RNNs. These are a severe bottleneck in the implementation of deep RNNs where the data is present in a form where relationships between features are more complex than linear functions. To overcome the vanishing gradient problem, the concept of long short-term memory (LSTM) was introduced by German researchers Sepp Hochreiter and Juergen Schmidhuber, in 1997. 

LSTM has proved highly useful in the fields of NLP, image caption generation, speech recognition, and other domains, where it broke previously established records after it was introduced. LSTMs store information outside the network that can be recalled at any moment, much like a secondary storage device in a computer system. This allows for delayed rewards to be introduced to the network. A spiritual analogy of LSTMs has been made, which calls it the "karma" or reward that a person receives for their actions carried out in the past. 

We shall be diving deeper into LSTMs and CNNs in the upcoming chapters of this book.