Python Deep Learning Cookbook

Python Deep Learning Cookbook

By : Indra den Bakker

Buy this Book

Python Deep Learning Cookbook

By: Indra den Bakker

Buy this Book

Overview of this book

Deep Learning is revolutionizing a wide range of industries. For many applications, deep learning has proven to outperform humans by making faster and more accurate predictions. This book provides a top-down and bottom-up approach to demonstrate deep learning solutions to real-world problems in different areas. These applications include Computer Vision, Natural Language Processing, Time Series, and Robotics. The Python Deep Learning Cookbook presents technical solutions to the issues presented, along with a detailed explanation of the solutions. Furthermore, a discussion on corresponding pros and cons of implementing the proposed solution using one of the popular frameworks like TensorFlow, PyTorch, Keras and CNTK is provided. The book includes recipes that are related to the basic concepts of neural networks. All techniques s, as well as classical networks topologies. The main purpose of this book is to provide Python programmers a detailed list of recipes to apply deep learning to common and not-so-common scenarios.

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Programming Environments, GPU Computing, Cloud Solutions, and Deep Learning Frameworks

Introduction

Setting up a deep learning environment

Launching an instance on Amazon Web Services (AWS)

Launching an instance on Google Cloud Platform (GCP)

Installing CUDA and cuDNN

Installing Anaconda and libraries

Connecting with Jupyter Notebooks on a server

Building state-of-the-art, production-ready models with TensorFlow

Intuitively building networks with Keras

Using PyTorch’s dynamic computation graphs for RNNs

Implementing high-performance models with CNTK

Building efficient models with MXNet

Defining networks using simple and efficient code with Gluon

Feed-Forward Neural Networks

Introduction

Understanding the perceptron

Implementing a single-layer neural network

Building a multi-layer neural network

Getting started with activation functions

Experiment with hidden layers and hidden units

Implementing an autoencoder

Tuning the loss function

Experimenting with different optimizers

Improving generalization with regularization

Adding dropout to prevent overfitting

Convolutional Neural Networks

Introduction

Applying pooling layers

Optimizing with batch normalization

Understanding padding and strides

Experimenting with different types of initialization

Implementing a convolutional autoencoder

Applying a 1D CNN to text

Recurrent Neural Networks

Introduction

Implementing a simple RNN

Adding Long Short-Term Memory (LSTM)

Using gated recurrent units (GRUs)

Implementing bidirectional RNNs

Character-level text generation

Reinforcement Learning

Introduction

Implementing policy gradients

Implementing a deep Q-learning algorithm

Generative Adversarial Networks

Introduction

Understanding GANs

Implementing Deep Convolutional GANs (DCGANs)

Upscaling the resolution of images with Super-Resolution GANs (SRGANs)

Computer Vision

Introduction

Augmenting images with computer vision techniques

Classifying objects in images

Localizing an object in images

Segmenting classes in images with U-net

Scene understanding (semantic segmentation)

Finding facial key points

Recognizing faces

Transferring styles to images

Natural Language Processing

Introduction

Analyzing sentiment

Translating sentences

Summarizing text

Speech Recognition and Video Analysis

Introduction

Implementing a speech recognition pipeline from scratch

Identifying speakers with voice recognition

Understanding videos with deep learning

Time Series and Structured Data

Introduction

Predicting stock prices with neural networks

Predicting bike sharing demand

Using a shallow neural network for binary classification

Game Playing Agents and Robotics

Introduction

Learning to drive a car with end-to-end learning

Learning to play games with deep reinforcement learning

Genetic Algorithm (GA) to optimize hyperparameters

Hyperparameter Selection, Tuning, and Neural Network Learning

Introduction

Visualizing training with TensorBoard and Keras

Working with batches and mini-batches

Using grid search for parameter tuning

Learning rates and learning rate schedulers

Comparing optimizers

Determining the depth of the network

Adding dropouts to prevent overfitting

Making a model more robust with data augmentation

Network Internals

Introduction

Visualizing training with TensorBoard

Analyzing network weights and more

Freezing layers

Storing the network topology and trained weights

Pretrained Models

Introduction

Large-scale visual recognition with GoogLeNet/Inception

Extracting bottleneck features with ResNet

Leveraging pretrained VGG models for new classes

Fine-tuning with Xception

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Identifying speakers with voice recognition

Next to speech recognition, there is we can do with sound fragments. While speech recognition focuses on converting speech (spoken words) to digital data, we can also use fragments to identify the person who is speaking. This is also known as voice recognition. Every individual has different characteristics when speaking, caused by differences in anatomy and behavioral patterns. Speaker verification and speaker identification are getting more attention in this digital age. For example, a home digital assistant can automatically detect which person is speaking.

In the following recipe, we'll be using the same data as in the previous recipe, where we implemented a speech recognition pipeline. However, this time, we will be classifying the speakers of the spoken numbers.

How to do it...

In this recipe, we start by importing all libraries:

import glob
import numpy as np
import random
import librosa
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer

import keras
from keras.layers import LSTM, Dense, Dropout, Flatten
from keras.models import Sequential
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ModelCheckpoint

Let's set SEED and the location of the .wav files:

SEED = 2017
DATA_DIR = 'Data/spoken_numbers_pcm/'

Let's split the .wav files in a training set and a validation set with scikit-learn's train_test_split function:

files = glob.glob(DATA_DIR + "*.wav")
X_train, X_val = train_test_split(files, test_size=0.2, random_state=SEED)

print('# Training examples: {}'.format(len(X_train)))
print('# Validation examples: {}'.format(len(X_val)))

To extract and print all unique labels, we use the following code:

labels = []
for i in range(len(X_train)):
    label = X_train[i].split('/')[-1].split('_')[1]
    if label not in labels:
        labels.append(label)
print(labels)

We can now define our one_hot_encode function as follows:

label_binarizer = LabelBinarizer()
label_binarizer.fit(list(set(labels)))

def one_hot_encode(x): return label_binarizer.transform(x)

Before we can feed the data to our network, some preprocessing needs to be done. We use the following settings:

n_features = 20
max_length = 80
n_classes = len(labels)

We can now our batch generator. The generator all preprocessing tasks, such as reading a .wav file and transforming it into usable input:

def batch_generator(data, batch_size=16):
    while 1:
        random.shuffle(data)
        X, y = [], []
        for i in range(batch_size):
            wav = data[i]
            wave, sr = librosa.load(wav, mono=True)
            label = wav.split('/')[-1].split('_')[1]
            y.append(one_hot_encode(label))
            mfcc = librosa.feature.mfcc(wave, sr)
            mfcc = np.pad(mfcc, ((0,0), (0, max_length-
            len(mfcc[0]))), mode='constant', constant_values=0) 
            X.append(np.array(mfcc))
        yield np.array(X), np.array(y)

Note

Please note the difference in our batch generator compared to the previous recipe.

Let's define the hyperparameters before defining our network architecture:

learning_rate = 0.001
batch_size = 64
n_epochs = 50
dropout = 0.5

input_shape = (n_features, max_length)
steps_per_epoch = 50

The network architecture we will use is quite straightforward. We will stack an LSTM layer on top of a dense layer, as follows:

 model = Sequential()
 model.add(LSTM(256, return_sequences=True, input_shape=input_shape,
   dropout=dropout))
 model.add(Flatten())
 model.add(Dense(128, activation='relu'))
 model.add(Dropout(dropout))
 model.add(Dense(n_classes, activation='softmax'))

Next, we set the function, compile the model, and a summary of our model:

opt = Adam(lr=learning_rate)
 model.compile(loss='categorical_crossentropy', optimizer=opt,
 metrics=['accuracy'])
 model.summary()

To prevent overfitting, we will be using early stopping and automatically store the model that has the highest validation accuracy:

callbacks = [ModelCheckpoint('checkpoints/voice_recognition_best_model_{epoch:02d}.hdf5', save_best_only=True),
            EarlyStopping(monitor='val_acc', patience=2)]

We are ready to start training and we will store the results in history:

 history = model.fit_generator(
   generator=batch_generator(X_train, batch_size),
   steps_per_epoch=steps_per_epoch,
   epochs=n_epochs,
   verbose=1,
   validation_data=batch_generator(X_val, 32),
   validation_steps=5,
   callbacks=callbacks
 )

In the following figure, the training accuracy and validation accuracy are plotted against the epochs:

Figure 9.1: Training and validation accuracy

Python Deep Learning Cookbook

By : Indra den Bakker

Python Deep Learning Cookbook

By: Indra den Bakker

Overview of this book

Related Content you might be interested in

Current Title:

Python Deep Learning Cookbook

Deep Learning with Keras

Neural Networks with Keras Cookbook

Hands-On Neural Networks

Identifying speakers with voice recognition

How to do it...

Note