The Deep Learning Architect's Handbook

By : Ee Kin Chin

5 (1)

Buy this Book

The Deep Learning Architect's Handbook

5 (1)

By: Ee Kin Chin

Buy this Book

Overview of this book

Deep learning enables previously unattainable feats in automation, but extracting real-world business value from it is a daunting task. This book will teach you how to build complex deep learning models and gain intuition for structuring your data to accomplish your deep learning objectives. This deep learning book explores every aspect of the deep learning life cycle, from planning and data preparation to model deployment and governance, using real-world scenarios that will take you through creating, deploying, and managing advanced solutions. You’ll also learn how to work with image, audio, text, and video data using deep learning architectures, as well as optimize and evaluate your deep learning models objectively to address issues such as bias, fairness, adversarial attacks, and model transparency. As you progress, you’ll harness the power of AI platforms to streamline the deep learning life cycle and leverage Python libraries and frameworks such as PyTorch, ONNX, Catalyst, MLFlow, Captum, Nvidia Triton, Prometheus, and Grafana to execute efficient deep learning architectures, optimize model performance, and streamline the deployment processes. You’ll also discover the transformative potential of large language models (LLMs) for a wide array of applications. By the end of this book, you'll have mastered deep learning techniques to unlock its full potential for your endeavors.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1 – Foundational Methods

Free Chapter

Chapter 1: Deep Learning Life Cycle

Technical requirements

Understanding the machine learning life cycle

Strategizing the construction of a deep learning system

Preparing data

Developing deep learning models

Delivering model insights

Managing risks

Summary

Further reading

Chapter 2: Designing Deep Learning Architectures

Technical requirements

Exploring the foundations of neural networks using an MLP

Understanding neural network gradients

Understanding gradient descent

Implementing an MLP from scratch

Summary

Chapter 3: Understanding Convolutional Neural Networks

Technical requirements

Understanding the convolutional neural network layer

Understanding the pooling layer

Building a CNN architecture

Designing a CNN architecture for practical usage

Exploring the CNN architecture families

Summary

Chapter 4: Understanding Recurrent Neural Networks

Technical requirements

Understanding LSTM

Understanding GRU

Understanding advancements over the standard GRU and LSTM layers

Summary

Chapter 5: Understanding Autoencoders

Technical requirements

Decoding the standard autoencoder

Exploring autoencoder variations

Building a CNN autoencoder

Summary

Chapter 6: Understanding Neural Network Transformers

Exploring neural network transformers

Decoding the original transformer architecture holistically

Uncovering transformer improvements using only the encoder

Uncovering transformer improvements using only the decoder

Summary

Chapter 7: Deep Neural Architecture Search

Technical requirements

Understanding the big picture of NAS

Understanding general hyperparameter search-based NAS

Understanding RL-based NAS

Understanding non-RL-based NAS

Summary

Chapter 8: Exploring Supervised Deep Learning

Technical requirements

Exploring supervised use cases and problem types

Implementing neural network layers for foundational problem types

Training supervised deep learning models effectively

Exploring general techniques to realize and improve supervised deep learning based solutions

Breaking down the multitask paradigm in supervised deep learning

Summary

Chapter 9: Exploring Unsupervised Deep Learning

Technical requirements

Exploring unsupervised deep learning applications

Creating pretrained network weights for downstream tasks

Creating general representations through unsupervised deep learning

Exploring zero-shot learning

Exploring the dimensionality reduction component of unsupervised deep learning

Detecting anomalies in external data

Summary

Part 2 – Multimodal Model Insights

Chapter 10: Exploring Model Evaluation Methods

Technical requirements

Exploring the different model evaluation methods

Engineering the base model evaluation metric

Exploring custom metrics and their applications

Exploring statistical tests for comparing model metrics

Relating the evaluation metric to success

Directly optimizing the metric

Summary

Chapter 11: Explaining Neural Network Predictions

Technical requirements

Exploring the value of prediction explanations

Demystifying prediction explanation techniques

Exploring gradient-based prediction explanations

Trusting and understanding integrated gradients

Using integrated gradients to aid in understanding predictions

Explaining prediction explanations automatically

Exploring common pitfalls in prediction explanations and how to avoid them

Summary

Developing deep learning models

Let’s start with a short recap of what deep learning is. Deep learning’s core foundational building block is a neural network. A neural network is an algorithm that was made to simulate the human brain. Its building blocks are called neurons, which mimic the billions of neurons the human brain contains. Neurons, in the context of neural networks, are objects that store simple information called weights and biases. Think of these as the memory of the algorithm.

Deep learning architectures are essentially neural network architectures that have three or more neural network layers. Neural network layers can be categorized into three high-level groups – the input layer, the hidden layer, and the output layer. The input layer is the simplest layer group and whose functionality is to pass the input data to subsequent layers. This layer group does not contain biases and can be considered passive neurons, but the group still contains weights in its connections to neurons from subsequent layers. The hidden layer comprises neurons that contain biases and weights in their connections to neurons from subsequent layers. Finally, the output layer comprises neurons that relate to the number of classes and problem types and contains bias. A best practice when counting neural network layers is to exclude the input layer when doing so. So, a neural network with one input layer, one hidden layer, and one output layer is considered to be a two-layer neural network. The following figure shows a basic neural network, called a multilayer perceptron (MLP), with a single input layer, a single hidden layer, and a single output layer:

Figure 1.12 – A simple deep learning architecture, also called an MLP

Being a subset of the wider machine learning category, deep learning models are capable of learning patterns from the data through a loss function and an optimizer algorithm that optimizes the loss function. A loss function defines the error made by the model so that its memory (weights and biases) can be updated to perform better in the next iteration. An optimizer algorithm is an algorithm that decides the strategy to update the weights given the loss value.

With this short recap, let’s dive into a summary of the common deep learning model families.

Deep learning model families

These layers can come in many forms as researchers have been able to invent new layer definitions to tackle new problem types and almost always comes with a non-linear activation function that allows the model to capture non-linear relationships between the data. Along with the variation of layers come many different deep learning architecture families that are meant for different problem types. A few of the most common and widely used deep learning models are as follows:

MLP for tabular data types. This will be explored in Chapter 2, Designing Deep Learning Architectures.
Convolutional neural network for image data types. This will be explored in Chapter 3, Understanding Convolutional Neural Networks.
Autoencoders for anomaly detection, data compression, data denoising, and feature representation learning. This will be explored in Chapter 5, Understanding Autoencoders.
Gated recurrent unit (GRU), Long Short-Term Memory (LSTM), and Transformers for sequence data types. These will be explored in Chapter 4, Understanding Recurrent Neural Networks, and Chapter 6, Understanding Neural Network Transformers, respectively.

These architectures will be the focus of Chapters 2 to 6, where we will discuss their methodology and go through some practical evaluation. Next, let’s discover the problem types we can tackle in deep learning.

The model development strategy

Today, deep learning models are easy to invent and create due to the advent of deep learning frameworks such as PyTorch and TensorFlow, along with their high-level library wrappers. Which framework you should choose at this point is a matter of preference regarding their interfaces as both frameworks are matured with years of improvement work done. Only when there is a pressing need for a very custom function to tackle a unique problem type will you need to choose the framework that can execute what you need. Once you’ve chosen your deep learning framework, the deep model creation, training, and evaluation process is pretty much covered all around.

However, model management functions do not come out of the box from these frameworks. Model management is an area of technology that allows teams, businesses, and deep learning practitioners to reliably, quickly, and effectively build models, evaluate models, deliver model insights, deploy models to production, and govern models. Model management can sometimes be referred to as machine learning operations (MLOps). You might still be wondering why you’d need such functionalities, especially if you’ve been building some deep learning models off Kaggle, a platform that hosts data and machine learning problems as competitions. So, here are some factors that drive the need to utilize these functionalities:

It is cumbersome to compare models manually:
- Manually typing performance data in an Excel sheet to keep track of model performance is slow and unreliable
Model artifacts are hard to keep track of:
- A model has many artifacts, such as its trained weights, performance graphs, feature importance, and prediction explanations
- It is also cumbersome to compare model artifacts
Model versioning is needed to make sure model-building experiments are not repeated:
- Overriding the top-performing model with the most reliable model insights is the last thing you want to experience
- Versioning should depend on the data partitioning method, model settings, and software library versions
It is not straightforward to deploy and govern models

Depending on the size of the team involved in the project and how often components need to be reused, different software and libraries would fit the bill. These software and libraries are split into paid and free (usually open sourced) categories. Metaflow, an open sourced software, is suitable for bigger data science teams where there are many chances of components needing to be reused across other projects and MLFlow (open sourced software) would be more suitable for small or single-person teams. Other notable model management tools are Comet (paid), Weights & Biases (paid), Neptune (paid), and Algorithmia (paid).

With that, we have provided a brief overview of deep learning model development methodology and strategy; we will dive deeper into model development topics in the next few chapters. But before that, let’s continue with an overview of the topic of delivering model insights.

The Deep Learning Architect's Handbook

By : Ee Kin Chin

The Deep Learning Architect's Handbook

By: Ee Kin Chin

Overview of this book

Related Content you might be interested in

Current Title:

The Deep Learning Architect's Handbook

Responsible AI in the Enterprise

Generative AI with LangChain

Deep Learning for Genomics

Developing deep learning models

Deep learning model families

The model development strategy