Book Image

Building Probabilistic Graphical Models with Python

By : Kiran R Karkera
Book Image

Building Probabilistic Graphical Models with Python

By: Kiran R Karkera

Overview of this book

<p>With the increasing prominence in machine learning and data science applications, probabilistic graphical models are a new tool that machine learning users can use to discover and analyze structures in complex problems. The variety of tools and algorithms under the PGM framework extend to many domains such as natural language processing, speech processing, image processing, and disease diagnosis.</p> <p>You've probably heard of graphical models before, and you're keen to try out new landscapes in the machine learning area. This book gives you enough background information to get started on graphical models, while keeping the math to a minimum.</p>
Table of Contents (15 chapters)
Building Probabilistic Graphical Models with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Joint distribution


We have seen that the marginal distribution is a distribution that describes a subset of random variables. Next, we will discuss a distribution that describes all the random variables in the set. This is called a joint distribution. Let us look at the joint distribution that involves the Degree score and Experience random variables in the job hunt example:

Degree score

Relevant Experience

 
 

Highly relevant

Not relevant

 

Poor

0.1

0.1

0.2

Average

0.1

0.4

0.5

Excellent

0.2

0.1

0.3

 

0.4

0.6

1

The values within the dark gray-colored cells are of the joint distribution, and the values in the light gray-colored cells are of the marginal distribution (sometimes called that because they are written on the margins). It can be observed that the individual marginal distributions sum up to 1, just like a normal probability distribution.

Once the joint distribution is described, the marginal distribution can be found by summing up individual rows or columns. In the preceding table, if we sum up the columns, the first column gives us the probability for Highly relevant, and the second column for Not relevant. It can be seen that a similar tactic applied on the rows gives us the probabilities for degree scores.