# Gaussian Mixture

In *Chapter 3*, *Introduction to Semi-Supervised Learning*, we discussed the Generative Gaussian Mixture model in the context of semi-supervised learning. In this section, we're going to apply the EM algorithm to derive the formulas for the parameter updates.

Let's start considering a dataset *X*, drawn from a data-generating process *p*_{data}:

We assume that the whole distribution is generated by the sum of *k* Gaussian distributions so that the probability of each sample can be expressed as follows:

In the previous expression, the term *w*_{j} = *P*(*N = j*) is the relative weight of the *j*^{th} Gaussian, while are the mean and the covariance matrix. For consistency with the laws of probability, we also need to impose the following:

Unfortunately, if we try to solve the problem directly, we need to manage the logarithm of a sum and the procedure becomes very complex. However, we have learned that it's possible to use latent variables as helpers whenever...