Principal Component Analysis

2018-01-13

This article is my notes on Principal Component Analysis (PCA) for Lecture 14 and 15 of Machine Learning by Andrew Ng. Given a set of high dimensional data $\{x^{(1)}, \dots, x^{(m)}\}$, where each $x^{(i)} \in \R^{n}$, with the assumption that these data actually roughly lie in a much smaller $k$ dimensional subspace, PCA tries to find a basis for this $k$ dimensional subspace. Let's look at a simple example:

烤猪排

2018-01-06

来到路村的第一件事是长胖。某人的厨艺实在了得，此次记录的是她的代表作之一——烤猪排。

烤猪排

LeetCode Contest 60

2017-11-25

这次比赛不难，但是我在最简单的两题各错了一次。

第一题Flood Fill

给定一个二维数组表示一张图片，以及一个坐标(r, c)。我们需要包含这个坐标且数字一样的连通分支整体变成另一个数。

我记得高中的时候学到Flood Fill这个词的时候有种莫名开心，可能是因为这个名字很形象地描述了DFS的过程吧。

class Solution(object):
    def floodFill(self, image, sr, sc, newColor):
        """
        :type image: List[List[int]]
        :type sr: int
        :type sc: int
        :type newColor: int
        :rtype: List[List[int]]
        """
        visited = set()
        n = len(image)
        m = len(image[0])
        color = image[sr][sc]

        def dfs(r, c):
            image[r][c] = newColor
            visited.add((r, c))
            for dr, dc in [(0, 1), (1, 0), (0, -1), (-1, 0)]:
                x, y = r+dr, c+dc
                if x < 0 or y < 0 or x >= n or y >= m or image[x][y] != color:
                    continue
                if (x, y) in visited:
                    continue
                dfs(x, y)

        dfs(sr, sc)
        return image

蒜蓉鸡蛋面

2017-11-12

最近学会了手工鸡蛋面，然后就开始各种黑暗尝试。我第一次的黑暗尝试是在面中加葱，所以揉出来的面是绿色的。

葱蛋面

Factor Analysis

2017-11-09

This article is my notes on the topic of factor analysis. These notes come out of lecture 13 and 14 of Andrew Ng's online course. Roughly speaking, factor analysis models some $n$ dimensional observed data with the assumption that these data are actually from some $d$ dimensional plane in $\R$, up to some Gaussian distributed errors. Let's make it more precise.

Suppose we have a set of observed data $\{x^{(1)}, \dots, x^{(m)}\}$ implicitly labeled by some latent random variable $z \in \R^d$ where

$$z \sim \mathcal{N}(0, I).$$

Factor analysis model tries to model $P(x)$ using the assumption that

$$\begin{equation} x|z \sim \mathcal{N}(\mu+\Lambda z, \Psi), \label{cond-xz} \end{equation}$$

for some $\mu \in \R^n, \Lambda \in \R^{n \times d}$ and diagonal matrix $\Psi \in \R^{n \times n}$. These $\mu, \Lambda$ and $\Psi$ are parameters of the model.

Expectation-Maximization algorithm

2017-11-05

In this article, I will collect my notes on Expectation-Maximization algorithm (EM) based on lecture 12 and 13 of Andrew Ng's online course. Given a set of unlabeled data points EM tries iteratively to determine the distribution of data, assuming that all data points are implicitly labeled (unobserved latent variables). For simplicity, we shall limit ourselves to the case where there are only finitely many implicit labels.

Description of the problem

Given a set of unlabeled data $\{x^{(1)}, \dots, x^{(m)}\}$, our goal is to determine $P(x)$, the distribution of $x$, with the following assumptions.

Assumptions.

There are finitely many unobserved latent variables $z \in \{1, \dots, k\}$ and they obey some multinomial distribution, i.e., $P(z=j) = \phi_j$ with $\sum \phi_j = 1$.
$\{P(x|z=j; a_j): j=1, \dots, k\}$ are a family of uniformly parametrized distribution.

Assumptions 1 and 2 will gives us a set of parameters $\theta = (\phi_1, \dots, \phi_j, a_1,\dots, a_j)$ and

$$\begin{equation} P(x; \theta) = \sum_{j=1}^k P(x|z=j; \theta)P(z=j; \theta). \label{px} \end{equation}$$

We want to find this set of parameters so that the likelihood function

$$L(\theta) = \prod_{i=1}^m P(x^{(i)}) = \prod_{i=1}^m \sum_{j=1}^k P(x^{(i)}|z=j; \theta)P(z=j; \theta).$$

is maximized. Or equivalently, the log likelihood function below is maximized:

$$\begin{equation} l(\theta) = \sum_{i=1}^m \log\left(\sum_{j=1}^k P(x^{(i)}, z=j; \theta)\right), \label{log-likelihood} \end{equation}$$

where

$$P(x^{(i)}, z=j; \theta) = P(x^{(i)}|z=j; \theta)P(z=j; \theta).$$

LeetCode Contest 57

2017-11-05

这次比赛考细节／模拟。

第一题Longest Word in Dictionary

给定一个单词列表，找出最长的一个单词x，使得

x[0:1], x[0:2], ..., x[0:len(x)]
都在该列表中。如果有长度一样的多个答案，返回最小的单词。

我是用trie做的。首先将单词按照长度排序，然后将单词一一放进一个trie中。用trie是因为将单词放入trie中的同时也可以判断单词的前面部分是否在trie里（这也是按长度排序的原因）。

class Solution(object):
    def longestWord(self, words):
        """
        :type words: List[str]
        :rtype: str
        """
        words.sort(key=lambda x: len(x))
        tree = {'#': {}}
        ans = ''
        for word in words:
            p = tree
            ok = True
            for c in word:
                if c not in p:
                    p[c] = {}
                ok = ok and '#' in p
                p = p[c]
            p['#'] = {}
            if ok and (len(word) > len(ans) or word < ans):
                    ans = word
        return ans

Digit recognition, Softmax

2017-10-17

在这篇博客中，我将使用softmax模型来识别手写数字。文章的第一部分是关于softmax模型的理论推导，而第二部分则是模型的实现。softmax的本质是一个线性模型，所以推导所需要的理论在我之前的一篇博客Generalized Linear Model已经详细介绍过了。softmax是逻辑回归(logistic regression)的推广：逻辑回归使用Bernoulli分布（二项分布），而softmax使用多项分布。

LeetCode Contest 54

2017-10-14

这次比赛之后，下次再遇到O(n²)的题目，而且n < 1000的时候，一定要用C++！这是泪的教训啊😭

第一题Degree of an Array

给定一个非空包含非负整数的数组nums，它的degree定义为出现次数最多的数字的出现次数。找出一个最短的连续子数组，使得它的degree与nums的degree一样。返回子数组的长度。

在统计每个数字频率的同时，记录这个数字出现的最左和最右位置(l, r)。那么包含这个数字的最短数组的长度为r-l+1。

class Solution(object):
    def findShortestSubArray(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        cnt = {}
        f = 0
        for i, n in enumerate(nums):
            if n not in cnt:
                cnt[n] = [1, i, i]
            else:
                c, a, _ = cnt[n]
                cnt[n] = [c+1, a, i]
            f = max(f, cnt[n][0])
        return min(v[2]-v[1]+1 for v in cnt.values() if v[0]==f)

LeetCode Contest 53

2017-10-08

这次比赛又是对Python不友好。最后一题同样的实现Python会TLE，但是C++就AC。不过也有可能我的算法不是太好。值得一提的是，我发现比赛时我第三题的算法是错误的，但是还是过了。

第一题Binary Number with Alternating Bits

给定一个正整数，问它的二进制表示是否01交错。

比较直接，代码如下。

class Solution(object):
    def hasAlternatingBits(self, n):
        """
        :type n: int
        :rtype: bool
        """
        b = bin(n)[2:]
        return all(b[i]!=b[i+1] for i in range(len(b)-1))

Previous 1 ... 3 4 5 6 7 ... 11 Next