Book Image

Hands-On Convolutional Neural Networks with TensorFlow

By : Iffat Zafar, Giounona Tzanidou, Richard Burton, Nimesh Patel, Leonardo Araujo
Book Image

Hands-On Convolutional Neural Networks with TensorFlow

By: Iffat Zafar, Giounona Tzanidou, Richard Burton, Nimesh Patel, Leonardo Araujo

Overview of this book

Convolutional Neural Networks (CNN) are one of the most popular architectures used in computer vision apps. This book is an introduction to CNNs through solving real-world problems in deep learning while teaching you their implementation in popular Python library - TensorFlow. By the end of the book, you will be training CNNs in no time! We start with an overview of popular machine learning and deep learning models, and then get you set up with a TensorFlow development environment. This environment is the basis for implementing and training deep learning models in later chapters. Then, you will use Convolutional Neural Networks to work on problems such as image classification, object detection, and semantic segmentation. After that, you will use transfer learning to see how these models can solve other deep learning problems. You will also get a taste of implementing generative models such as autoencoders and generative adversarial networks. Later on, you will see useful tips on machine learning best practices and troubleshooting. Finally, you will learn how to apply your models on large datasets of millions of images.
Table of Contents (17 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

TensorFlow API levels


Before we get stuck into writing TensorFlow code, it is important to be aware of the different levels of API abstraction offered by TensorFlow in Python. This way, we can understand what is available to us when we write our code, and also we can choose the right functions or operations for the job. A lot of the time, there is little need to rewrite from scratch things that are already available for us to use in TensorFlow.

TensorFlow offers three layers of API abstraction to help write your code, and these can be visualized in the following diagram:

At the lowest level, you have the basic TensorFlow ops such as tf.nn.conv2d and tf.nn.relu. These low-level primitives give the user the most control when working with TensorFlow. However, using them comes at the price of having to look after a lot more things yourself when constructing a graph and writing more boilerplate code.

Don't worry about understanding any of the following code examples yet, that will come very soon I promise; it is just here now for demonstrating the different API levels in TensorFlow.

So, for example, if we want to create a convolution layer to use in our ML model, then this might look something like the following:

def my_conv_2d(input, weight_shape, num_filters, strides): 
    my_weights = tf.get_variable(name="weights", shape=weight_shape)
    my_bias = tf.get_variable(name="bias", shape=num_filters) 
    my_conv = tf.nn.conv2d(input, my_weights, strides=strides, padding='same', name='conv_layer1')
    my_conv = tf.nn.bias_add(my_conv, my_bias)
    conv_layer_out = tf.nn.relu(my_conv)
    return conv_layer_out

This example is much simpler than you would actually implement, but you can already see the number of lines of code starting to build up, along with things you have to take care of such as constructing weights and adding bias terms. A model would also have many different kinds of layers, not just a convolution layer, all having to be constructed in very similar ways to this.

So, not only is it quite laborious having to write these things out for every new kind of layer you want in your model, it also introduces more areas where bugs can potentially work their way into your code which is never a good thing.

Luckily for us, TensorFlow has a second level of abstraction that helps to make your life easier when building TensorFlow graphs. One example from this level of abstraction is the layers API. The layers API allows you to work easily with many of the building blocks that are common across many machine learning tasks.

The layers API works by wrapping up everything we wrote in the previous example and abstracting it away from us, so we don't have to worry about it anymore. For example, we can condense the preceding code to construct a convolution layer into one function call. Building the same convolution layer as before would now look like this:

def my_conv_2d(input, kernel_size, num_filters, strides): 
    conv_layer_out = tf.layers.conv2d(input, filters=num_filters, kernel_size=kernel_size, strides=strides, padding='same', activation=tf.nn.relu, name='conv_layer1')
    return conv_layer_out

There are two other APIs that work alongside layers. The first is the datasets API that provides easy loading and feeding of data to your TensorFlow graph. The second one is the metrics API that provides tools to test how well your trained machine learning models are doing. We will learn about all these later in the book.

There is one final layer to the API stack that is the highest level of abstraction that TensorFlow provides, and that is called the estimators API. In much the same way that using tf.layers took care of constructing weights and adding biases for an individual layer, the estimators API wraps up construction of many layers so that we can define a whole model, made up of multiple different layers, in one function call.

The use of the estimators API will not be covered in this book, but if the reader wishes to learn more about estimators there are some useful tutorials available on the TensorFlow website.

This book will focus on using the low-level APIs along with the layers, datasets, and metrics APIs to construct, train, and evaluate your own ML models. We believe that by getting hands-on with these lower-level APIs the reader will come out with a greater understanding of how TensorFlow works under the hood, and be better equipped to tackle a wide variety of future problems that might have to use these lower-level functions.

Eager execution

At the time of this writing, Google had just introduced the eager execution API to TensorFlow. Eager Execution is TensorFlow's answer to another deep learning library called PyTorch. It allows you to bypass the usual TensorFlow way of working where you must first define a computational graph and then execute the graph to get a result. This is known as static graph computation. Instead, with Eager Execution, you can now create the so-called dynamic graphs that are defined on the fly as you run your program. This allows for a more traditional, imperative way of programming when using TensorFlow. Unfortunately, eager execution is still under development with some features still missing, and will not be featured in this book. More information on Eager Execution can be found at the TensorFlow website.