Expectation-Maximization algorithm
In this article, I will collect my notes on Expectation-Maximization algorithm (EM) based on lecture 12 and 13 of Andrew Ng's online course. Given a set of unlabeled data points EM tries iteratively to determine the distribution of data, assuming that all data points are implicitly labeled (unobserved latent variables). For simplicity, we shall limit ourselves to the case where there are only finitely many implicit labels.
Description of the problem
Given a set of unlabeled data \(\{x^{(1)}, \dots, x^{(m)}\}\), our goal is to determine \(P(x)\), the distribution of \(x\), with the following assumptions.
Assumptions 1 and 2 will gives us a set of parameters \(\theta = (\phi_1, \dots, \phi_j, a_1,\dots, a_j)\) and
We want to find this set of parameters so that the likelihood function
is maximized. Or equivalently, the log likelihood function below is maximized:
where