In the preceding equation,
are the parameters determined during training. For example, in the context of the celebrity faces dataset, this is equivalent to finding a distribution that can draw faces. Similarly, in the MNIST dataset, this distribution can generate recognizable handwritten digits.
In machine learning, to perform a certain level of inference, we're interested in finding
, a joint distribution between inputs, x, and the latent variables, z. The latent variables are not part of the dataset but instead encode certain properties observable from inputs. In the context of celebrity faces, these might be facial expressions, hairstyles, hair color, gender, and so on. In the MNIST dataset, the latent variables may represent the digit and writing styles.
is practically a distribution of input data points and their attributes...