In the preceding chapter, we learned about parameters and structures from the data in the case of Bayesian networks. In this chapter, we will focus on learning parameters and structures in the case of Markov networks. As it turns out, the learning task in the case of Markov networks is more difficult. This is because of the partition function that comes in the probability distribution. Because this partition function depends on all factors, it doesn't let us decompose our optimization functions into separate terms, as in the case of Bayesian networks. Therefore, we have to use some iterative method over the optimization function to find the optimal point in the parameter space.
In this chapter, we will discuss the following topics:
Maximum likelihood parameter estimation
Learning with approximate inference
Structure learning