Expectation-Maximization algorithm

In this article, I will collect my notes on Expectation-Maximization algorithm (EM) based on lecture 12 and 13 of Andrew Ng's online course. Given a set of unlabeled data points EM tries iteratively to determine the distribution of data, assuming that all data points are implicitly labeled (unobserved latent variables). For simplicity, we shall limit ourselves to the case where there are only finitely many implicit labels.

Description of the problem

Given a set of unlabeled data \(\{x^{(1)}, \dots, x^{(m)}\}\), our goal is to determine \(P(x)\), the distribution of \(x\), with the following assumptions.


  1. There are finitely many unobserved latent variables \(z \in \{1, \dots, k\}\) and they obey some multinomial distribution, i.e., \(P(z=j) = \phi_j\) with \(\sum \phi_j = 1\).

  2. \(\{P(x|z=j; a_j): j=1, \dots, k\}\) are a family of uniformly parametrized distribution.

Assumptions 1 and 2 will gives us a set of parameters \(\theta = (\phi_1, \dots, \phi_j, a_1,\dots, a_j)\) and

$$\begin{equation} P(x; \theta) = \sum_{j=1}^k P(x|z=j; \theta)P(z=j; \theta). \label{px} \end{equation}$$

We want to find this set of parameters so that the likelihood function

$$L(\theta) = \prod_{i=1}^m P(x^{(i)}) = \prod_{i=1}^m \sum_{j=1}^k P(x^{(i)}|z=j; \theta)P(z=j; \theta).$$

is maximized. Or equivalently, the log likelihood function below is maximized:

$$\begin{equation} l(\theta) = \sum_{i=1}^m \log\left(\sum_{j=1}^k P(x^{(i)}, z=j; \theta)\right), \label{log-likelihood} \end{equation}$$


$$P(x^{(i)}, z=j; \theta) = P(x^{(i)}|z=j; \theta)P(z=j; \theta).$$

Read More

LeetCode Contest 57


第一题Longest Word in Dictionary


x[0:1], x[0:2], ..., x[0:len(x)]



class Solution(object):
    def longestWord(self, words):
        :type words: List[str]
        :rtype: str
        words.sort(key=lambda x: len(x))
        tree = {'#': {}}
        ans = ''
        for word in words:
            p = tree
            ok = True
            for c in word:
                if c not in p:
                    p[c] = {}
                ok = ok and '#' in p
                p = p[c]
            p['#'] = {}
            if ok and (len(word) > len(ans) or word < ans):
                    ans = word
        return ans

Read More

Digit recognition, Softmax

在这篇博客中,我将使用softmax模型来识别手写数字。文章的第一部分是关于softmax模型的理论推导,而第二部分则是模型的实现。softmax的本质是一个线性模型,所以推导所需要的理论在我之前的一篇博客Generalized Linear Model已经详细介绍过了。softmax是逻辑回归(logistic regression)的推广:逻辑回归使用Bernoulli分布(二项分布),而softmax使用多项分布。

Read More

LeetCode Contest 54

这次比赛之后,下次再遇到O(n2)的题目,而且n < 1000的时候,一定要用C++!这是泪的教训啊😭

第一题Degree of an Array


在统计每个数字频率的同时,记录这个数字出现的最左和最右位置(l, r)。那么包含这个数字的最短数组的长度为r-l+1。

class Solution(object):
    def findShortestSubArray(self, nums):
        :type nums: List[int]
        :rtype: int
        cnt = {}
        f = 0
        for i, n in enumerate(nums):
            if n not in cnt:
                cnt[n] = [1, i, i]
                c, a, _ = cnt[n]
                cnt[n] = [c+1, a, i]
            f = max(f, cnt[n][0])
        return min(v[2]-v[1]+1 for v in cnt.values() if v[0]==f)

Read More

LeetCode Contest 53


第一题Binary Number with Alternating Bits



class Solution(object):
    def hasAlternatingBits(self, n):
        :type n: int
        :rtype: bool
        b = bin(n)[2:]
        return all(b[i]!=b[i+1] for i in range(len(b)-1))

Read More

Support Vector Machine

This article is my notes on support vector machine for Lecture 7 and 8 of Machine Learning by Andrew Ng.


In a binary classification problem, we can use logistic regression

$$h_\theta(x) = \frac{1}{1+e^{-\theta^T x}} = g(\theta^T x),$$

where \(g\) is the sigmoid function with a figure of it below.

Then given input \(x\), the model predicts \(1\) if and only if \(\theta^x \ge 0\), in which case \(h_\theta(x) = g(\theta^T x) \ge 0.5\); and it predicts \(0\) if and only if \(\theta^T x < 0\). Moreover, based on the shape of sigmoid function, if \(\theta^T x >> 0\), we are very confident that \(y=1\). Likewise, if \(\theta^T x << 0\), we are very confident that \(y=0\). Therefore, we hope that for the training set \(\{(x^{(i)}, y^{(i)})\}_{i=1}^m\), we can find such a \(\theta\) that \(\theta^T x^{(i)} >> 0\) if \(y^{(i)}=1\) and \(\theta^T x^{(i)} << 0\) if \(y^{(i)}=0\).

Read More

Generative Model

This article is my notes on generative model for Lecture 5 and 6 of Machine Learning by Andrew Ng. What we do in logistic regression using generalized linear model is that, we approximate \(P(y|x)\) using given data. This kind of learning algorithms is discriminative, in which we predict \(y\) based on the input features \(x\). On the contrary, generative model is to model \(P(x|y)\), the probability of the features \(x\) given class \(y\). In other words, we want to study how the features structure looks like given a class \(y\). If we also learn what \(P(y)\) is, we can easily recover \(P(y|x)\), for example, in the binary classification problem,

$$\begin{equation} P(y=1|x) = \frac{P(x|y=1)P(y=1)}{P(x)}, \label{eqn:bayes} \end{equation}$$

where \(P(x) = P(x|y=0)P(y=0) + P(x|y=1)P(y=1)\).

In this article, we are going to see a simple example of generative model on Gaussian discriminant analysis and Naive Bayes.

Read More

LeetCode Contest 52


第一题Repeated String Match



i = (len(A) + len(B) - 1) / len(A)

最大次数j的上界是2i。然后对[i, j]这个区间进行二分查找。


class Solution(object):
    def repeatedStringMatch(self, A, B):
        :type A: str
        :type B: str
        :rtype: int
        i = (len(A) + len(B) - 1) / len(A)
        j = 2 * i
        ans = -1
        while i <= j:
            k = (i+j) / 2
            tmp = A * k
            if tmp.find(B) != -1:
                ans = k
                j = k - 1
                i = k + 1
        return ans

Read More

Generalized Linear Model (Examples)

This article is a companion article to my another post Generalized Linear Model. In this article, I will implement some of the learning algorithms in Generalized Linear Model. To be more specific, I will do some examples on linear regression and logistic regression. With some effort, google search gives me some very good example data sets to work with. The datasets collected by Larry Winner is one of the excellent sets, which will be used in the article.

The implementations here use Python. Required 3rd party libraries are:

Read More

LeetCode Contest 51


第一题Baseball Game


  1. 整数:代表本轮得分
  2. "+":本轮得分为上两轮得分之和
  3. "D":本轮得分为上轮得分的两倍
  4. "C":上轮得分无效。


class Solution(object):
    def calPoints(self, ops):
        :type ops: List[str]
        :rtype: int
        points = []
        for op in ops:
            if op == 'C':
            elif op == 'D':
            elif op == '+':
        return sum(points)

Read More