Bayesian inference provides a unified framework to deal with all sorts of uncertainties when learning patterns from data using machine learning models and using it for predicting future observations. However, learning and implementing Bayesian models is not easy for data science practitioners due to the level of mathematical treatment involved. Also, applying Bayesian methods to real-world problems requires high computational resources. With the recent advancements in cloud and high-performance computing and easy access to computational resources, Bayesian modeling has become more feasible to use for practical applications today. Therefore, it would be advantageous for all data scientists and data engineers to understand Bayesian methods and apply them in their projects to achieve better results.

This book gives comprehensive coverage of the Bayesian machine learning models and the R packages that implement them. It begins with an introduction to the fundamentals of probability theory and R programming for those who are new to the subject. Then, the book covers some of the most important machine learning methods, both supervised learning and unsupervised learning, implemented using Bayesian inference and R. Every chapter begins with a theoretical description of the method, explained in a very simple manner. Then, relevant R packages are discussed and some illustrations using datasets from the UCI machine learning repository are given. Each chapter ends with some simple exercises for you to get hands-on experience of the concepts and R packages discussed in the chapter. The state-of-the-art topics covered in the chapters are Bayesian regression using linear and generalized linear models, Bayesian classification using logistic regression, classification of text data using Naïve Bayes models, and Bayesian mixture models and topic modeling using Latent Dirichlet allocation.

The last two chapters are devoted to the latest developments in the field. One chapter discusses deep learning, which uses a class of neural network models that are currently at the frontier of artificial intelligence. The book concludes with the application of Bayesian methods on Big Data using frameworks such as Hadoop and Spark.

Chapter 1, *Introducing the Probability Theory*, covers the foundational concepts of probability theory, particularly those aspects required for learning Bayesian inference, which are presented to you in a simple and coherent manner.

Chapter 2, *The R Environment*, introduces you to the R environment. After reading through this chapter, you will learn how to import data into R, make a selection of subsets of data for its analysis, and write simple R programs using functions and control structures. Also, you will get familiar with the graphical capabilities of R and some advanced capabilities such as loop functions.

Chapter 3, *Introducing Bayesian Inference*, introduces you to the Bayesian statistic framework. This chapter includes a description of the Bayesian theorem, concepts such as prior and posterior probabilities, and different methods to estimate posterior distribution such as MAP estimates, Monte Carlo simulations, and variational estimates.

Chapter 4, *Machine Learning Using Bayesian Inference*, gives an overview of what machine learning is and what some of its high-level tasks are. This chapter also discusses the importance of Bayesian inference in machine learning, particularly in the context of how it can help to avoid important issues such as model overfit and how to select optimum models.

Chapter 5, *Bayesian Regression Models*, presents one of the most common supervised machine learning tasks, namely, regression modeling, in the Bayesian framework. It shows by using an example how you can get tighter confidence intervals of prediction using Bayesian regression models.

Chapter 6, *Bayesian Classification Models*, presents how to use the Bayesian framework for another common machine learning task, classification. The two Bayesian models of classification, Naïve Bayes and Bayesian logistic regression, are discussed along with some important metrics for evaluating the performance of classifiers.

Chapter 7, *Bayesian Models for Unsupervised Learning*, introduces you to the concepts behind unsupervised and semi-supervised machine learning and their Bayesian treatment. The two most important Bayesian unsupervised models, the Bayesian mixture model and LDA, are discussed.

Chapter 8, *Bayesian Neural Networks*, presents an important class of machine learning model, namely neural networks, and their Bayesian implementation. Neural network models are inspired by the architecture of the human brain and they continue to be an area of active research and development. The chapter also discusses deep learning, one of the latest advances in neural networks, which is used to solve many problems in computer vision and natural language processing with remarkable accuracy.

Chapter 9, *Bayesian Modeling at Big Data Scale*, covers various frameworks for performing large-scale Bayesian machine learning such as Hadoop, Spark, and parallelization frameworks that are native to R. The chapter also discusses how to set up instances on cloud services, such as Amazon Web Services and Microsoft Azure, and run R programs on them.

To learn the examples and try the exercises presented in this book, you need to install the latest version of the R programming environment and the RStudio IDE. Apart from this, you need to install the specific R packages that are mentioned in each chapter of this book separately.

This book is intended for data scientists who analyze large datasets to generate insights and for data engineers who develop platforms, solutions, or applications based on machine learning. Although many data science practitioners are quite familiar with machine learning techniques and R, they may not know about Bayesian inference and its merits. This book, therefore, would be helpful to even experienced data scientists and data engineers to learn about Bayesian methods and incorporate them in to their projects to get better results. No prior experience is required in R or probability theory to use this book.

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The first function is `gibbs_met`

."

A block of code is set as follows:

myMean ←function(x){ s ←sum(x) l ←length(x) mean ←s/l mean } >x ←c(10,20,30,40,50) >myMean(x) [1] 30

Any command-line input or output is written as follows:

**setwd("directory path")**

**New terms** and **important words** are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "You can also set this from the menu bar of RStudio by clicking on **Session** | **Set Working Directory**."

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail `<[email protected]>`

, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the **Errata Submission Form** link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the **Errata** section.

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at `<[email protected]>`

with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

If you have a problem with any aspect of this book, you can contact us at `<[email protected]>`

, and we will do our best to address the problem.