Book Image

Hands-On Deep Learning with TensorFlow

By : Dan Van Boxel
Book Image

Hands-On Deep Learning with TensorFlow

By: Dan Van Boxel

Overview of this book

Dan Van Boxel’s Deep Learning with TensorFlow is based on Dan’s best-selling TensorFlow video course. With deep learning going mainstream, making sense of data and getting accurate results using deep networks is possible. Dan Van Boxel will be your guide to exploring the possibilities with deep learning; he will enable you to understand data like never before. With the efficiency and simplicity of TensorFlow, you will be able to process your data and gain insights that will change how you look at data. With Dan’s guidance, you will dig deeper into the hidden layers of abstraction using raw data. Dan then shows you various complex algorithms for deep learning and various examples that use these deep neural networks. You will also learn how to train your machine to craft new features to make sense of deeper layers of data. In this book, Dan shares his knowledge across topics such as logistic regression, convolutional neural networks, recurrent neural networks, training deep networks, and high level interfaces. With the help of novel practical examples, you will become an ace at advanced multilayer networks, image recognition, and beyond.
Table of Contents (12 chapters)

Logistic regression training


First, you'll learn about the loss function for our machine learning classifier and implement it in TensorFlow. Then, we'll quickly train the model by evaluating the right TensorFlow node. Finally, we'll verify that our model is reasonably accurate and the weights make sense.

Developing the loss function

Optimizing our model really means minimizing how wrong we are. With our labels in one-hot style, it's easy to compare these with the class probabilities predicted by the model. The categorical cross_entropy function is a formal way to measure this. While the exact statistics are beyond the scope of this course, you can think of it as punishing the model for more for less accurate predictions. To compute it, we multiply our one-hot real labels element-wise with the natural log of the predicted probabilities, then sum these values and negate them. Conveniently, TensorFlow already includes this function as tf.nn.softmax_cross_entropy_with_logits() and we can just call that:

# Climb on cross-entropy
cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(
        logits = y + 1e-50, labels = y_))

Note that we are adding a small error value of 1e-50 here to avoid numerical instability problems.

Training the model

TensorFlow is convenient in that it provides built-in optimizers to take advantage of the loss function we just wrote. Gradient descent is a common choice and will slowly nudge our weights toward better results. This is the node that will update our weights:

# How we train
train_step = tf.train.GradientDescentOptimizer(
                0.02).minimize(cross_entropy)

Before we actually start training, we should specify a few more nodes to assess how well the model does:

# Define accuracy
correct_prediction = tf.equal(tf.argmax(y,1),
                     tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(
           correct_prediction, "float"))

The correct_prediction node is 1 if our model assigns the highest probability to the correct class, and 0 otherwise. The accuracy variable averages these predictions over the available data, giving us an overall sense of how well the model did.

When training in machine learning, we often want to use the same data point multiple times to squeeze all the information out. Each pass through the entire training data is called an epoch. Here, we're going to save both the training and validation accuracy every 10 epochs:

# Actually train
epochs = 1000
train_acc = np.zeros(epochs//10)
test_acc = np.zeros(epochs//10)
for i in tqdm(range(epochs)):
    # Record summary data, and the accuracy
    if i % 10 == 0:
        # Check accuracy on train set
        A = accuracy.eval(feed_dict={
            x: train.reshape([-1,1296]),
            y_: onehot_train})
        train_acc[i//10] = A
        # And now the validation set
        A = accuracy.eval(feed_dict={
            x: test.reshape([-1,1296]),
            y_: onehot_test})
        test_acc[i//10] = A
    train_step.run(feed_dict={
        x: train.reshape([-1,1296]),
        y_: onehot_train})

Note that we use feed_dict to pass in different types of data to get different output values. Finally, train_step.run updates the model every iteration. This should only take a few minutes on a typical computer, much less if you're using a GPU, and a bit more on an underpowered machine.

You just trained a model with TensorFlow; awesome!

Evaluating the model accuracy

After 1,000 epochs, let's take a look at the model. If you have Matplotlib installed, you can view the accuracies in a graphical plot; if not, you can still look at the number. For the final results, use the following code:

# Notice that accuracy flattens out
print(train_acc[-1])
print(test_acc[-1])

If you do have Matplotlib installed, you can use the following code to display the plot:

# Plot the accuracy curves
plt.figure(figsize=(6,6))
plt.plot(train_acc,'bo')
plt.plot(test_acc,'rx')

You should see something like the following plot (note that we used some random initialization, so it might not be exactly the same):

It seems like the validation accuracy flattens out after about 400-500 iterations; beyond this, our model may either be overfitting or not learning much more. Also, even though the final accuracy of about 40 percent might seem poor, recall that, with five classes, a totally random guess would only have 20 percent accuracy. With this limited dataset, the simple model is doing all it can.

It's also often helpful to look at computed weights. These can give you a clue as to what the model thinks is important. Let's plot them by pixel position for a given class:

# Look at a subplot of the weights for each font
f, plts = plt.subplots(5, sharex=True)
for i in range(5):
    plts[i].pcolor(W.eval()[:,i].reshape([36,36]))

This should give you a result similar to the following (again, if the plot comes out very wide, you can squeeze in the window size to square it up):

We can see that the weights near the interior are important in some models, while the weights on the outside are essentially zero. This makes sense, since none of the font characters reach the corners of the images.

Again, note that your final results might look a little different due to random initialization effects. Always feel free to experiment and change the parameters of the model; that's how you'll learn new things.

Summary

In this chapter, we installed TensorFlow on a machine we can use. After some small steps with basic computations, we jumped into a machine learning problem, successfully building a decent model with just logistic regression and a few lines of TensorFlow code.

In the next chapter, we'll see TensorFlow in its prime with deep neural networks.