Book Image

Neural Network Projects with Python

By : James Loy
Book Image

Neural Network Projects with Python

By: James Loy

Overview of this book

Neural networks are at the core of recent AI advances, providing some of the best resolutions to many real-world problems, including image recognition, medical diagnosis, text analysis, and more. This book goes through some basic neural network and deep learning concepts, as well as some popular libraries in Python for implementing them. It contains practical demonstrations of neural networks in domains such as fare prediction, image classification, sentiment analysis, and more. In each case, the book provides a problem statement, the specific neural network architecture required to tackle that problem, the reasoning behind the algorithm used, and the associated Python code to implement the solution from scratch. In the process, you will gain hands-on experience with using popular Python libraries such as Keras to build and train your own neural networks from scratch. By the end of this book, you will have mastered the different neural network architectures and created cutting-edge AI projects in Python that will immediately strengthen your machine learning portfolio.
Table of Contents (10 chapters)

TensorFlow and Keras – open source deep learning libraries

TensorFlow is an open source library for neural networks and deep learning developed by the Google Brain team. Designed for scalability, TensorFlow runs across a variety of platforms, from desktops to mobile devices and even to clusters of computers. Today, TensorFlow is one of the most popular machine learning libraries and is used extensively in a wide variety of real-world applications. For example, TensorFlow powers the AI behind many online services that we use today, including image search, voice recognition, recommendation engines. TensorFlow has become the silent workhorse powering many AI applications, even though we might not even notice it.

Keras is a high-level API that runs on top of TensorFlow. So, why Keras? Why do we need another library to act as an API for TensorFlow? To put it simply, Keras removes the complexities in building neural networks, and enables rapid experimentation and testing without concerning the user with low-level implementation details. Keras provides a simple and intuitive API for building neural networks using TensorFlow. Its guiding principles are modularity and extensibility. As we shall see later, it is extremely easy to build neural networks by stacking Keras API calls on top of one another, which you can think of like stacking Lego blocks in order to create bigger structures. This beginner-friendly approach has led to the popularity of Keras as one of the top machine learning libraries in Python. In this book, we will use Keras as the primary machine learning library for building our neural network projects.

The fundamental building blocks in Keras

The fundamental building blocks in Keras are layers, and we can stack layers linearly to create a model. The Loss Function that we choose will provide the metrics for which we will use to train our model using an Optimizer. Recall that while building our neural network from scratch earlier, we had to define and write the code for those terms. We call these the fundamental building blocks in Keras because we can build any neural network using these basic structures.

The following diagram illustrates the relationship between these building blocks in Keras:

Layers – the atom of neural networks in Keras

You can think of layers in Keras as an atom, because they are the smallest unit of our neural network. Each layer takes in an input performs a mathematical function, then outputs that for the next layer. The core layers in Keras includes dense layers, activation layers, and dropout layers. There are other layers that are more complex, including convolutional layers and pooling layers. In this book, you will be exposed to projects that uses all these layers.

For now, let's take a closer look at dense layers, which are by far the most common type of layer used in Keras. A dense layer is also known as a fully-connected layer. It is fully-connected because it uses all of its input (as opposed to a subset of the input) for the mathematical function that it implements.

A dense layer implements the following function:

is the output, is the activation function, is the input, and and are the weights and biases respectively.

This equation should look familiar to you. We used the fully-connected layer when we were building our neural network from scratch earlier.

Models – a collection of layers

If layers can be thought of as atoms, then models can be thought of as molecules in Keras. A model is simply a collection of layers, and the most commonly used model in Keras is the Sequential model. A Sequential model allows us to linearly stack layers on one another, where a single layer is connected to one other layer only. This allows us to easily design model architectures without worrying about the underlying math. As we will see in later chapters, there is a significant amount of thought needed to ensure that consecutive layer dimensions are compatible with one another, something that Keras takes care for us under the hood!

Once we have defined our model architecture, we need to define our training process, which is done using the compile method in Keras. The compile method takes in several arguments, but the most important arguments we need to define is the optimizer and the loss function.

Loss function – error metric for neural network training

In an earlier section, we defined the loss function as a way to evaluate the goodness of our predictions (that is, how far off our predictions are). The nature of our problem should dictate the loss function used. There are several loss functions implemented in Keras, but the most commonly used loss functions are mean_squared_error, categorical_crossentropy, and binary_crossentropy.

As a general rule of thumb, this is how you should choose which loss function to use:

  • mean_squared_error if the problem is a regression problem
  • categorical_crossentropy if the problem is a multiclass classification problem
  • binary_crossentropy if the problem is a binary classification problem

In certain cases, you might find that the default loss functions in Keras are unsuitable for your problem. In that case, you can define your own loss function by defining a custom function in Python, then passing that custom function to the compile method in Keras.

Optimizers – training algorithm for neural networks

An optimizer is an algorithm for updating the weights of the neural network in the training process. Optimizers in Keras are based on the gradient descent algorithm, which we have covered in an earlier section.

While we won't cover in detail the differences between each optimizer, it is important to note that our choice of optimizer should depend on the nature of the problem. In general, researchers have found that the Adam optimizer works best for DNNs, while the sgd optimizer works best for shallow neural networks. The Adagrad optimizer is also a popular choice, and it adapts the learning rate of the algorithm based on how frequent a particular set of weights are updated. The main advantage of this approach is that it eliminates the need to manually tune the learning rate hyperparameter, which is a time-consuming process in the machine learning workflow.

Creating neural networks in Keras

Let's take a look at how we can use Keras to build the two-layer neural network that we introduced earlier. To build a linear collection of layers, first declare a Sequential model in Keras:

from keras.models import Sequential
model = Sequential()

This creates an empty Sequential model that we can now add layers to. Adding layers in Keras is simple and similar to stacking Lego blocks on top of one another. We start by adding layers from the left (the layer closest to the input):

from keras.layers import Dense
# Layer 1
model.add(Dense(units=4, activation='sigmoid', input_dim=3))
# Output Layer
model.add(Dense(units=1, activation='sigmoid'))

Stacking layers in Keras is as simple as calling the model.add() command. Notice that we had to define the number of units in each layer. Generally, increasing the number of units increases the complexity of the model, as it means that there are more weights to be trained. For the first layer, we had to define input_dim. This informs Keras the number of features (that is, columns) in the dataset. Also, note that we have used a Dense layer. A Dense layer is simply a fully connected layer. In later chapters, we will introduce other kinds of layers, specific to different types of problems.

We can verify the structure of our model by calling the model.summary() function:

print(model.summary())

The output is shown in the following screenshot:

The number of params is the number of weights and biases we need to train for the model that we have just defined.

Once we are satisfied with our model's architecture, let's compile it and start the training process:

from keras import optimizers
sgd = optimizers.SGD(lr=1)
model.compile(loss='mean_squared_error', optimizer=sgd)
Note that we have defined the learning rate of the sgd optimizer to be 1.0 (lr=1). In general, the learning rate is a hyperparameter of the neural network that needs to be tuned carefully depending on the problem. We will take a closer look at tuning hyperparameters in later chapters.

The mean_squared_error loss function in Keras is similar to the sum-of-squares error that we have defined earlier. We are using the SGD optimizer to train our model. Recall that gradient descent is the method of updating the weights and biases by moving it toward the derivative of the loss function with respect to the weights and biases.

Let's use the same data that we used earlier to train our neural network. This will allow us to compare the predictions obtained using Keras versus the predictions obtained when we created our neural network from scratch earlier.

Let's define an X and Y NumPy array, corresponding to the features and the target variables respectively:

import numpy as np
# Fixing a random seed ensures reproducible results
np.random.seed(9)

X = np.array([[0,0,1],
[0,1,1],
[1,0,1],
[1,1,1]])
y = np.array([[0],[1],[1],[0]])

Finally, let's train the model for 1500 iterations:

model.fit(X, y, epochs=1500, verbose=False)

To get the predictions, run the model.predict() command on our data:

print(model.predict(X))

The preceding code gives the following output:

Comparing this to the predictions that we obtained earlier, we can see that the results are extremely similar. The major advantage of using Keras is that we did not have to worry about the low-level implementation details and mathematics while building our neural network, unlike what we did earlier. In fact, we did no math at all. All we did in Keras was to call a series of APIs to build our neural network. This allows us to focus on high-level details, enabling rapid experimentation.