Book Image

Enhancing Deep Learning with Bayesian Inference

By : Matt Benatan, Jochem Gietema, Marian Schneider
Book Image

Enhancing Deep Learning with Bayesian Inference

By: Matt Benatan, Jochem Gietema, Marian Schneider

Overview of this book

Deep learning has an increasingly significant impact on our lives, from suggesting content to playing a key role in mission- and safety-critical applications. As the influence of these algorithms grows, so does the concern for the safety and robustness of the systems which rely on them. Simply put, typical deep learning methods do not know when they don’t know. The field of Bayesian Deep Learning contains a range of methods for approximate Bayesian inference with deep networks. These methods help to improve the robustness of deep learning systems as they tell us how confident they are in their predictions, allowing us to take more in how we incorporate model predictions within our applications. Through this book, you will be introduced to the rapidly growing field of uncertainty-aware deep learning, developing an understanding of the importance of uncertainty estimation in robust machine learning systems. You will learn about a variety of popular Bayesian Deep Learning methods, and how to implement these through practical Python examples covering a range of application scenarios. By the end of the book, you will have a good understanding of Bayesian Deep Learning and its advantages, and you will be able to develop Bayesian Deep Learning models for safer, more robust deep learning systems.
Table of Contents (11 chapters)

6.2 Introducing approximate Bayesian inference via dropout

Dropout is traditionally used to prevent overfitting an NN. First introduced in 2012, it is now used in many common NN architectures and is one of the easiest and most widely used regularization methods. The idea of dropout is to randomly turn off (or drop) certain units of a neural network during training. Because of this, the model cannot solely rely on a particular small subset of neurons to solve the task it was given. Instead, the model is forced to find different ways to solve its task. This improves the robustness of the model and makes it less likely to overfit.

If we simplify a network to y = Wx, where y is the output of our network, x the input, and W our model weights, we can think of dropout as:

 ( { wj, p wˆj = ( 0, otherwise

where wj is the new weights after applying dropout, wj is our weights before applying dropout, and p is our probability of not applying dropout.

The original dropout paper recommends randomly dropping 50% of the units in...