Enhancing Deep Learning with Bayesian Inference

By : Matt Benatan, Jochem Gietema, Marian Schneider

Enhancing Deep Learning with Bayesian Inference

By: Matt Benatan, Jochem Gietema, Marian Schneider

Overview of this book

Deep learning has an increasingly significant impact on our lives, from suggesting content to playing a key role in mission- and safety-critical applications. As the influence of these algorithms grows, so does the concern for the safety and robustness of the systems which rely on them. Simply put, typical deep learning methods do not know when they don’t know. The field of Bayesian Deep Learning contains a range of methods for approximate Bayesian inference with deep networks. These methods help to improve the robustness of deep learning systems as they tell us how confident they are in their predictions, allowing us to take more in how we incorporate model predictions within our applications. Through this book, you will be introduced to the rapidly growing field of uncertainty-aware deep learning, developing an understanding of the importance of uncertainty estimation in robust machine learning systems. You will learn about a variety of popular Bayesian Deep Learning methods, and how to implement these through practical Python examples covering a range of application scenarios. By the end of the book, you will have a good understanding of Bayesian Deep Learning and its advantages, and you will be able to develop Bayesian Deep Learning models for safer, more robust deep learning systems.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Chapter 1: Bayesian Inference in the Age of Deep Learning

1.1 Technical requirements

1.2 Wonders of the deep learning age

1.3 Understanding the limitations of deep learning

1.4 Core topics

1.5 Setting up the work environment

1.6 Summary

Free Chapter

Chapter 2: Fundamentals of Bayesian Inference

2.1 Refreshing our knowledge of Bayesian modeling

2.2 Bayesian inference via sampling

2.3 Exploring the Gaussian process

2.4 Summary

2.5 Further reading

Chapter 3: Fundamentals of Deep Learning

3.1 Technical requirements

3.2 Introducing the multi-layer perceptron

3.3 Reviewing neural network architectures

3.4 Understanding the problem with typical neural networks

3.5 Summary

3.6 Further reading

Chapter 4: Introducing Bayesian Deep Learning

4.1 Technical requirements

Chapter 5: Principled Approaches for Bayesian Deep Learning

5.1 Technical requirements

5.2 Explaining notation

5.3 Familiar probabilistic concepts from deep learning

5.4 Bayesian inference by backpropagation

5.5 Implementing BBB with TensorFlow

5.6 Scalable Bayesian Deep Learning with Probabilistic Backpropagation

5.7 Implementing PBP

5.8 Summary

5.9 Further reading

Chapter 6: Using the Standard Toolbox for Bayesian Deep Learning

6.1 Technical requirements

6.2 Introducing approximate Bayesian inference via dropout

6.3 Using ensembles for model uncertainty estimates

6.4 Exploring neural network augmentation with Bayesian last-layer methods

6.5 Summary

Chapter 7: Practical Considerations for Bayesian Deep Learning

7.1 Technical requirements

7.2 Balancing uncertainty quality and computational considerations

7.3 BDL and sources of uncertainty

7.4 Summary

7.5 Further reading

Chapter 8: Applying Bayesian Deep Learning

8.1 Technical requirements

8.2 Detecting out-of-distribution data

8.3 Being robust against dataset shift

8.4 Using data selection via uncertainty to keep models fresh

8.5 Using uncertainty estimates for smarter reinforcement learning

8.6 Susceptibility to adversarial input

8.7 Summary

8.8 Further reading

Chapter 9: Next Steps in Bayesian Deep Learning

9.1 Current trends in BDL

9.2 How are BDL methods being applied to solve real-world problems?

9.3 Latest methods in BDL

9.4 Alternatives to Bayesian deep learning

9.5 Your next steps in BDL

9.6 Further reading

Why subscribe?

Other Books You Might Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

2.1 Refreshing our knowledge of Bayesian modeling

Bayesian modeling is concerned with understanding the probability of an event occurring given some prior assumptions and some observations. The prior assumptions describe our initial beliefs, or hypothesis, about the event. For example, let’s say we have two six-sided dice, and we want to predict the probability that the sum of the two dice is 5. First, we need to understand how many possible outcomes there are. Because each die has 6 sides, the number of possible outcomes is 6 × 6 = 36. To work out the possibility of rolling a 5, we need to work out how many combinations of values will sum to 5:

Figure 2.1: Illustration of all values summing to five when rolling two six-sided dice

As we can see here, there are 4 combinations that add up to 5, thus the probability of having two dice produce a sum of 5 is -4 36 , or 1 9 . We call this initial belief the prior. Now, what happens if we incorporate information from an observation? Let’s say we know what the value for one of the dice will be – let’s say 3. This shrinks our number of possible values down to 6, as we only have the remaining die to roll, and for the result to be 5, we’d need this value to be 2.

Figure 2.2: Illustration of remaining value, which sums to five after rolling the first die

Because we assume our die is fair, the probability of the sum of the dice being 5 is now 1 6 . This probability, called the posterior, is obtained using information from our observation. At the core of Bayesian statistics is Bayes’ rule (hence ”Bayesian”), which we use to determine the posterior probability given some prior knowledge. Bayes’ rule is defined as:

P(A |B ) = P(B-|A)×-P-(A)- P(B )

Where we can define P(A|B) as P(d₁ + d₂ = 5|d₁ = 3), where d₁ and d₂ represent dice 1 and 2 respectively. We can see this in action using our previous example. Starting with the likelihood, that is, the term on the left of our numerator, we see that:

1- P (B |A) = P (d1 = 3|d1 + d2 = 5) = 4

We can verify this by looking at our grid. Moving to the second part of the numerator – the prior – we see that:

4 1 P(A ) = P (d1 + d2 = 5) =--= -- 36 9

On the denominator, we have our normalization constant (also referred to as the marginal likelihood), which is simply:

P(B ) = P (d1 = 3) = 1 6

Putting this all together using Bayes’ theorem, we have:

14 × 19 1 P(d1 + d2 = 5|d1 = 3) = --1---= 6- 6

What we have here is the probability of the outcome being 5 if we know one die’s value. However, in this book, we’ll often be referring to uncertainties rather than probabilities – and learning methods to obtain uncertainty estimates with DNNs. These methods belong to a broader class of uncertainty quantification, and aim to quantify the uncertainty in the predictions from an ML model. That is, we want to predict P(ŷ|𝜃), where ŷ is a prediction from a model, and 𝜃 represents the parameters of the model.

As we know from fundamental probability theory, probabilities are bound between 0 and 1. The closer we are to 1, the more likely – or probable – the event is. We can view our uncertainty as subtracting our probability from 1. In the context of the example here, the probability of the sum being 5 is P(d₁ + d₂ = 5|d₁ = 3) = 1 6 = 0.166. So, our uncertainty is simply 1 − = = 0.833, meaning that there’s a > 80% chance that the outcome will not be 5. As we proceed through the book, we’ll learn about different sources of uncertainty, and how uncertainties can help us to develop more robust deep learning systems.

Let’s continue using our dice example to build a better understanding of for model uncertainty estimates. Many common machine learning models work on the basis of maximum likelihood estimation or MLE. That is, they look to predict the value that is most likely: tuning their parameters during training to produce the most likely outcome ŷ given some input x. As a simple illustration, let’s say we want to predict the value of d₁ + d₂ given a value of d₁. We can simply define this as the expectation of d₁ + d₂ conditioned on d₁:

ˆy = 𝔼 [d + d |d ] 1 2 1

That is, the mean of the possible values of d₁ + d₂.

Setting d₁ = 3, our possible values for d₁ + d₂ are {4,5,6,7,8,9} (as illustrated in Figure 2.2), making our mean:

1 ∑6 4+ 5 + 6+ 7+ 8 + 9 μ = -- ai = --------------------= 6.5 6 i=1 6

This is the value we’d get from a simple linear model, such as a linear regression defined by:

ˆy = βx + ξ

In this case, the values of our intersection and bias are β = 1, ξ = 3.5. If we change our value of d₁ to 1, we see that this mean changes to 4.5 – the mean of the set of possible values of d₁ + d₂|d₁ = 1, in other words {2,3,4,5,6,7}. This perspective on our model predictions is important: while this example is very straightforward, the same principle applies to far more sophisticated models and data. The value we typically see with ML models is the expectation, otherwise known as the mean. As you are likely aware, the mean is often referred to as the first statistical moment – with the second statistical moment being the variance, and the variance allows us to quantify uncertainty.

The variance for our simple example is defined as follows:

∑6 2 σ2 = --i=1(ai −-μ) n − 1

These statistical moments should be familiar to you, as should the fact that the variance here is represented as the square of the standard deviation, σ. For our example here, for which we assume d₂ is a fair die, the variance will always be constant: σ² = 2.917. That is to say, given any value of d₁, we know that values of d₂ are all equally likely, so the uncertainty does not change. But what if we have an unfair die d₂, which has a 50% chance of landing on a 6, and a 10% chance of landing on each other number? This changes both our mean and our variance. We can see this by looking at how we would represent this as a set of possible values (in other words, a perfect sample of the die) – the set of possible values for d₁ + d₂|d₁ = 1 now becomes {2,3,4,5,6,7,7,7,7,7}. Our new model will now have a bias of ξ = 4.5, making our prediction:

ˆy = 1 × 1 + 4.5 = 5.5

We see that the expectation has increased due to the change in the underlying probability of the values of die d₁. However, the important difference here is in the change in the variance value:

∑10 (a − μ)2 σ2 = --i=1--i----- = 3.25 n − 1

Our variance has increased. As variance essentially gives us the average of the distance of each possible value from the mean, this shouldn’t be surprising: given the weighted die, it’s more likely that the outcome will be distant from the mean than with an unweighted die, and thus our variance increases. To summarize, in terms of uncertainty: the greater the likelihood that the outcome will be further from the mean, the greater the uncertainty.

This has important implications for how we interpret predictions from machine learning models (and statistical models more generally). If our predictions are an approximation of the mean, and our uncertainty quantifies how likely it is for an outcome to be distant from the mean, then our uncertainty tells us how likely it is that our model prediction is incorrect. Thus, model uncertainties allow us to decide when to trust the predictions, and when we should be more cautious.

The examples given here are very basic, but should help to give you an idea of what we’re looking to achieve with model uncertainty quantification. We will continue to explore these concepts as we learn about some of the benchmark methods for Bayesian inference, learning how these concepts apply to more complex, real-world problems. We’ll start with perhaps the most fundamental method of Bayesian inference: sampling.

Enhancing Deep Learning with Bayesian Inference

By : Matt Benatan, Jochem Gietema, Marian Schneider

Enhancing Deep Learning with Bayesian Inference

By: Matt Benatan, Jochem Gietema, Marian Schneider

Overview of this book

Related Content you might be interested in

Current Title:

Enhancing Deep Learning with Bayesian Inference

Practical Guide to Applied Conformal Prediction in Python

Bayesian Analysis with Python

Bayesian Analysis with Python.

2.1 Refreshing our knowledge of Bayesian modeling