Book Image

Hands-On Deep Learning with TensorFlow

By : Dan Van Boxel
Book Image

Hands-On Deep Learning with TensorFlow

By: Dan Van Boxel

Overview of this book

Dan Van Boxel’s Deep Learning with TensorFlow is based on Dan’s best-selling TensorFlow video course. With deep learning going mainstream, making sense of data and getting accurate results using deep networks is possible. Dan Van Boxel will be your guide to exploring the possibilities with deep learning; he will enable you to understand data like never before. With the efficiency and simplicity of TensorFlow, you will be able to process your data and gain insights that will change how you look at data. With Dan’s guidance, you will dig deeper into the hidden layers of abstraction using raw data. Dan then shows you various complex algorithms for deep learning and various examples that use these deep neural networks. You will also learn how to train your machine to craft new features to make sense of deeper layers of data. In this book, Dan shares his knowledge across topics such as logistic regression, convolutional neural networks, recurrent neural networks, training deep networks, and high level interfaces. With the help of novel practical examples, you will become an ace at advanced multilayer networks, image recognition, and beyond.
Table of Contents (12 chapters)

Logistic regression model building


Okay, let's get started with building a real machine learning model. First, we'll see the proposed machine learning problem: font classification. Then, we'll review a simple algorithm for classification, called logistic regression. Finally, we'll implement logistic regression in TensorFlow.

Introducing the font classification dataset

Before we jump in, let's load all the necessary modules:

import tensorflow as tf
import numpy as np

If you're copying and pasting to IPython, make sure your autoindent property is set to OFF:

%autoindent

The tqdm module is optional; it just shows nice progress bars:

try:
    from tqdm import tqdm
except ImportError:
    def tqdm(x, *args, **kwargs):
        return x

Next, we'll set a seed of 0, just to get consistent data splitting from run to run:

# Set random seed
np.random.seed(0)

In this book, we've provided a dataset of the images of characters using five fonts. For convenience, these are stored in a compressed NumPy file (data_with_labels.npz), which can be found in the download package of this book. You can easily load these into Python with numpy.load:

# Load data
data = np.load('data_with_labels.npz')
train = data['arr_0']/255.
labels = data['arr_1']

The train variable here holds the actual pixel values scaled from 0 to 1, and labels holds the type of font that it was; therefore, it'll be either 0, 1, 2, 3, or 4, as there are five fonts in total. You can print out these values, so you can look at them using the following code:

# Look at some data
print(train[0])
print(labels[0])

However, that's not very instructive, as most of the values are zeroes and only the central part of the screen contains the image data:

If you have Matplotlib installed, now is a good place to import it. We'll use plt.ion() to automatically bring up figures when needed:

# If you have matplotlib installed
import matplotlib.pyplot as plt
plt.ion()

Here are some example images of characters from each font:

Yeah, they're pretty flashy. In the dataset, each image is represented as a 36 x 36 two-dimensional matrix of pixel darkness values. The 0 value represents a white pixel, while 255 represents a black pixel. Everything in between is a shade of gray. Here's the code to display these fonts on your own machine:

# Let's look at a subplot of one of A in each font
f, plts = plt.subplots(5, sharex=True)
c = 91
for i in range(5):
    plts[i].pcolor(train[c + i * 558],
                   cmap=plt.cm.gray_r)

If your plot appears really wide, you can easily resize the window just using your mouse. It's often much more work to resize it ahead of time in Python if you're simply plotting interactively. Our goal is to decide which font an image belongs to, given that we have many other labeled images of the fonts. To expand the dataset and help avoid overfitting, we have also jittered each character around in the 36 x 36 area, giving us nine times as many data points.

It may be helpful to come back to this after working with later models. It's important to keep the original data in mind, no matter how advanced the final model is.

Logistic regression

If you're familiar with linear regression, you're halfway toward understanding logistic regression. Basically, we're going to assign a weight to each pixel in the image, then take the weighted sum of those pixels (beta for weights and X for pixels). This will give us a score for that image being a particular font. Every font will have its own set of weights, as they will value pixels differently. To convert these scores into proper probabilities (represented by Y), we will use what's called the softmax function to force their sum to be between 0 and 1, as illustrated next. Whatever probability is the greatest for a particular image, we will classify it into the associated class.

You can read more about the theory of logistic regression in most statistical modeling textbooks. Here is its formula:

One good reference that focuses on applications is William H. Greene's Econometric Analysis, Pearson, published in the year 2012.

Getting data ready

Implementing logistic regression is pretty easy in TensorFlow and will serve as scaffolding for more complex machine learning algorithms. First, we need to convert our integer labels into a one-hot format. This means, instead of labeling an image with font class 2, we transform the label into [0, 0, 1, 0, 0]. That is, we stick 1 in position two (note 0-up counting is common in computer science) and 0 for every other class. Here's the code for our to_onehot function:

def to_onehot(labels,nclasses = 5):
    '''
    Convert labels to "one-hot" format.
    >>> a = [0,1,2,3]
    >>> to_onehot(a,5)
    array([[ 1.,  0.,  0.,  0.,  0.],
           [ 0.,  1.,  0.,  0.,  0.],
           [ 0.,  0.,  1.,  0.,  0.],
           [ 0.,  0.,  0.,  1.,  0.]])
    '''
    outlabels = np.zeros((len(labels),nclasses))
    for i,l in enumerate(labels):
        outlabels[i,l] = 1
    return outlabels

With this done, we can go ahead and call the function:

onehot = to_onehot(labels)

For the pixels, we don't really want a matrix in this case, so we'll flatten the 36 x 36 numbers into a one-dimensional vector of length 1,296, but this will come a little bit later. Also, recall that we've rescaled the pixel values of 0-255 so that they fall between 0 and 1.

Okay, our final piece of preparation is to split our dataset into training and validation sets. This will help us catch overfitting later on. The training set will help us determine the weights in our logistic regression model, and the validation set will just be used to confirm that those weights are reasonably correct on new data:

# Split data into training and validation
indices = np.random.permutation(train.shape[0])
valid_cnt = int(train.shape[0] * 0.1)
test_idx, training_idx = indices[:valid_cnt],\
                         indices[valid_cnt:]
test, train = train[test_idx,:],\
              train[training_idx,:]
onehot_test, onehot_train = onehot[test_idx,:],\
                        onehot[training_idx,:]

Building a TensorFlow model

Okay, let's kick off the TensorFlow code by creating an interactive session:

sess = tf.InteractiveSession()

With this, we've started our first model in TensorFlow.

We're going to use a placeholder variable for x, which represents our input images. This is just to tell TensorFlow that we will supply the value for this node via feed_dict later on:

# These will be inputs
## Input pixels, flattened
x = tf.placeholder("float", [None, 1296])

Also, note that we can specify the shape of this tensor, and here we have used None as one of the sizes. The None size allows us to send an arbitrary number of data points into the algorithm at once for batch processing. We'll use the variable y_ likewise to hold our known labels to be used for training later on:

## Known labels
y_ = tf.placeholder("float", [None,5])

To perform logistic regression, we need a set of weights (W). In fact, we need 1,296 weights for each of the five font classes, which will give us our shape. Note that we also want to include an extra weight for each class as a bias (b). This is the same as adding an extra input variable that always takes the value 1:

# Variables
W = tf.Variable(tf.zeros([1296,5]))
b = tf.Variable(tf.zeros([5]))

With all these TensorFlow variables floating around, we need to make sure they get initialized. Let's call them now:

# Just initialize
sess.run(tf.global_variables_initializer())

Good job! You've got everything prepared. Now you can implement the softmax formula to compute probabilities. Because we set up our weights and input very carefully, TensorFlow makes this task very easy with just a call to tf.matmul and tf.nn.softmax:

# Define model
y = tf.nn.softmax(tf.matmul(x,W) + b)

That's it! You've implemented an entire machine learning classifier in TensorFlow. Nice work. But where do we get the values for the weights? Let's take a look at using TensorFlow to train the model.