Book Image

Hands-On Neural Networks

By : Leonardo De Marchi, Laura Mitchell
Book Image

Hands-On Neural Networks

By: Leonardo De Marchi, Laura Mitchell

Overview of this book

Neural networks play a very important role in deep learning and artificial intelligence (AI), with applications in a wide variety of domains, right from medical diagnosis, to financial forecasting, and even machine diagnostics. Hands-On Neural Networks is designed to guide you through learning about neural networks in a practical way. The book will get you started by giving you a brief introduction to perceptron networks. You will then gain insights into machine learning and also understand what the future of AI could look like. Next, you will study how embeddings can be used to process textual data and the role of long short-term memory networks (LSTMs) in helping you solve common natural language processing (NLP) problems. The later chapters will demonstrate how you can implement advanced concepts including transfer learning, generative adversarial networks (GANs), autoencoders, and reinforcement learning. Finally, you can look forward to further content on the latest advancements in the field of neural networks. By the end of this book, you will have the skills you need to build, train, and optimize your own neural network model that can be used to provide predictable solutions.
Table of Contents (16 chapters)
Free Chapter
1
Section 1: Getting Started
4
Section 2: Deep Learning Applications
9
Section 3: Advanced Applications

Supervised learning algorithms

There are a lot of algorithms at our disposal for supervised learning. We choose the algorithm based on the task and the data we have at our disposal. If we don't have much data and there is already some knowledge around our problem, deep learning is probably not the best approach to start with. We should rather try simpler algorithms and come up with relevant features based on the knowledge we have.

Starting simple is always a good practice; for example, for categorization, a good starting point can be a decision tree. A simple decision tree algorithm that is difficult to overfit is random forest. It also gives good results out of the box. For regression problems, linear regression is still very popular, especially in domains, where it's necessary to justify the decision taken. For other problems, such as recommender systems, a good starting point can be Matrix Factorization. Each domain has a standard algorithm that is better to start with.

A simple example of a task could be to predict the price of a house for sale, given the location and some information about the house. This is a regression problem, and there are a set of algorithms in scikit-learn that can perform the task. If we want to use a liner regression, we can do the following:

from sklearn.datasets.california_housing import fetch_california_housing
from sklearn.linear_model import LinearRegression

# Using a standard dataset that we can find in scikit-learn
cal_house = fetch_california_housing()

cal_house_X_train
= cal_house.data[:-20] cal_house_X_test = cal_house.data[-20:] # Split the targets into training/testing sets cal_house_y_train = cal_house.target[:-20] cal_house_y_test = cal_house.target[-20:] # Create linear regression object regr = linear_model.LinearRegression() # Train the model using the training sets regr.fit(cal_house_X_train, cal_house_y_train)
# Calculating the predictions
predictions = regr.predict(cal_house_X_test)
# Calculating the loss
print('MSE: {:.2f}'.format(mean_squared_error(cal_house_y_test, predictions)))

It's possible to run the file after activated our virtual environment (or conda environment) and saved the code in a file named house_LR.py. Then from where you placed your file run the following command line:

 python house_LR.py

The interesting part about NNs is that they can be used instead of any of the tasks mentioned previously, provided that enough data is available. Moreover, when a neural network is trained it means that we have a way to do feature engineering, and part of the network itself can be used to do the feature engineering for similar tasks. This method is called transfer learning (TL), and we will dedicate a chapter to it later on.

Metrics

The metric chosen to evaluate the algorithm is another extremely important step in the machine learning process. You can also choose one particular metric as the loss of the algorithm aims to minimize. The loss is a measure of the error that our algorithm produces if we compare its predictions to our ground truth. The loss is very important as it determines how the algorithm will evaluate its mistakes and therefore how it will learn the function that maps the inputs with the outputs.

We can divide again the metrics by the type of problems we have, metrics for classification, or regression.

Regression metrics

In Keras, we can see the following few important metrics:

  • Mean Squared Error: mean_squared_error, MSE, or mse
  • Mean Absolute Error: mean_absolute_error, MAE, or mae
  • Mean Absolute Percentage Error: mean_absolute_percentage_error, MAPE, or mape
  • Cosine Proximity: cosine_proximity, cosine

In Keras, you can specify the metric you are optimizing for, the loss, only after a model is already instantiated. We will see later in the book how to choose the metrics we are interested in.

Classification metrics

In Keras, we can find the following classification metrics:

  • Binary accuracy: This measures the accuracy of the result of a binary classification problem. In keras, it's possible to use the functions binary_accuracy and acc.
  • ROC AUC: It measures the AUC of a binary classification problem. In keras, it's possible to use the functions categorical_accuracy and acc.
  • Categorical accuracy: It measures the accuracy of the result of a multiclass classification problem. In keras, it's possible to use the categorical_accuracy and acc functions.
  • Sparse categorical accuracy: It has the same functionality of categorical accuracy but for a sparse problem, sparse_categorical_accuracy.
  • Top k categorical accuracy: It returns the accuracy of the top k elements. In keras, it's possible to use the top_k_categorical_accuracy functions (this requires you to specify a k parameter).
  • Sparse top k categorical accuracy: It returns the accuracy of the top k elements. In keras, it's possible to use the sparse_top_k_categorical_accuracy function (requires you to specify a k parameter).

The first thing we need to determine is whether the dataset in is either balanced or unbalanced, regarding the classes we need to predict. If the dataset is unbalanced (for example, 99% if the instances belong to only one class), metrics such as precision and accuracy can be misleading. In that case, if our system always predicts the most common class, both precision and recall will look very good, but the system will be totally useless. That's why it's important to choose metrics that are useful for the system we are modeling. In this case, for example, ROC AUC will be better as it's looking at the misclassifications our algorithm does and how bad the error is.

Evaluating the model

To evaluate an algorithm, it's necessary to judge the performance of the algorithm on data that was not used to train the model. For this reason, it's common to split the data in the training and test set. The training set is used to train the model, which means that it's used to find the parameters of our algorithm. For example, training a decision tree will determine the values and variables that will create the split of the branches of the tree. The test set must remain totally hidden from the training. That means that all operations such as features engineering or feature scaling must be trained on the training set only and applied to the test set, as in the following example.

Usually, the training set will be 70-80% of the dataset, while the test set will be the rest:

from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.linear_model import LinearRegression
from sklearn import datasets

# import some data
iris = datasets.load_iris()

X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.3, random_state=0)

scaler = preprocessing.StandardScaler().fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_test_transformed = scaler.transform(X_train)

clf = LinearRegression().fit(X_train_transformed, y_train)

predictions = clf.predict(X_test_transformed)

print('Predictions: ', predictions)

The most common way to evaluate a supervised learning algorithm offline is cross-validation. This technique consists of dividing the dataset into test and training multiple times and use one part for training and one for testing.

This allows to not only check for overfitting but also to evaluate the variance in our loss

For problems where it's not possible to randomly divide the data, such as in a time series, scikit-learn has other splitting methods, such as the TimeSeriesSplit class.

In Keras, it's possible to specify a simple way to split in train/test directly during fit:

hist = model.fit(x, y, validation_split=0.2)

If the data does not fit in memory, it's also possible to use train_on_batch and test_on_batch.

For image data, in Keras, it is also possible to use the folder structure to create train and test and specify the labels. To accomplish this, it is important to use the flow_from_directory function, which will load the data with the labels and train/test split as specified. We will need to have the following directory structure:

data/
train/
category1/
001.jpg
002.jpg
...
category2/
003.jpg
004.jpg
...
validation/
category1/
0011.jpg
0022.jpg
...
category2/
0033.jpg
0044.jpg
...

Use the following function:

flow_from_directory(directory, target_size=(96, 96), color_mode='rgb', classes=None, class_mode='categorical', batch_size=128, shuffle=True, seed=11, save_to_dir=None, save_prefix='output', save_format='jpg', follow_links=False, subset=None, interpolation='nearest')

TensorBoard

TensorFlow provides a handy way to visualize a variety of important aspects of our network. To be able to use this useful tool, Keras will need to create some log files that TensorBoard will read.

A way to do this is to use callbacks. A callback is a set of functions that is applied at a specified stage during the model's training. It is possible to use these functions to get a view on the internal states and statistics of the model while it's training. Is it possible to pass a list of callbacks to the .fit() method of a Keras model. The relevant methods of the callbacks will then be called at each stage of the training.

Here is an example of callbacks:

keras.callbacks.TensorBoard(log_dir='./Graph', histogram_freq=0,  
write_graph=True, write_images=True)

Then it's possible to launch the TensorBoard interface to visualize the graph in this case, but it's also possible to visualize the metrics, the loss, or even the words embedding.

To launch TensorBoard from a terminal window, simply type in the following:

tensorboard --logdir=path/to/log-directory

This command will start a server and it will be possible to access it from http://localhost:6006. With TensorBoard, it will be possible to easily compare the performances of different network architectures or parameters:

This is the screenshot of a running TensorBoard