Support Vector Machine

2017-10-08

This article is my notes on support vector machine for Lecture 7 and 8 of Machine Learning by Andrew Ng.

Intuition

In a binary classification problem, we can use logistic regression

$$h_\theta(x) = \frac{1}{1+e^{-\theta^T x}} = g(\theta^T x),$$

where $g$ is the sigmoid function with a figure of it below.

Then given input $x$, the model predicts $1$ if and only if $\theta^x \ge 0$, in which case $h_\theta(x) = g(\theta^T x) \ge 0.5$; and it predicts $0$ if and only if $\theta^T x < 0$. Moreover, based on the shape of sigmoid function, if $\theta^T x >> 0$, we are very confident that $y=1$. Likewise, if $\theta^T x << 0$, we are very confident that $y=0$. Therefore, we hope that for the training set $\{(x^{(i)}, y^{(i)})\}_{i=1}^m$, we can find such a $\theta$ that $\theta^T x^{(i)} >> 0$ if $y^{(i)}=1$ and $\theta^T x^{(i)} << 0$ if $y^{(i)}=0$.

Generative Model

2017-10-02

This article is my notes on generative model for Lecture 5 and 6 of Machine Learning by Andrew Ng. What we do in logistic regression using generalized linear model is that, we approximate $P(y|x)$ using given data. This kind of learning algorithms is discriminative, in which we predict $y$ based on the input features $x$. On the contrary, generative model is to model $P(x|y)$, the probability of the features $x$ given class $y$. In other words, we want to study how the features structure looks like given a class $y$. If we also learn what $P(y)$ is, we can easily recover $P(y|x)$, for example, in the binary classification problem,

$$\begin{equation} P(y=1|x) = \frac{P(x|y=1)P(y=1)}{P(x)}, \label{eqn:bayes} \end{equation}$$

where $P(x) = P(x|y=0)P(y=0) + P(x|y=1)P(y=1)$.

In this article, we are going to see a simple example of generative model on Gaussian discriminant analysis and Naive Bayes.

LeetCode Contest 52

2017-10-01

这次比赛跪了。。

第一题Repeated String Match

给定字符串A和B，问至少要重复多少次A使得B是A的子字符串。

二分查找。首先算出最少需要多少次才有可能使得B是子字符串：必须要保证重复后的A的长度不小于B的长度。我们可以简单算出最少需要重复：

i = (len(A) + len(B) - 1) / len(A)
最大次数j的上界是2i。然后对[i, j]这个区间进行二分查找。

这题竟然WA了2次(因为typo)，TLE一次(一开始用j＝10i)。。。

class Solution(object):
    def repeatedStringMatch(self, A, B):
        """
        :type A: str
        :type B: str
        :rtype: int
        """
        i = (len(A) + len(B) - 1) / len(A)
        j = 2 * i
        ans = -1
        while i <= j:
            k = (i+j) / 2
            tmp = A * k
            if tmp.find(B) != -1:
                ans = k
                j = k - 1
            else:
                i = k + 1
        return ans

Generalized Linear Model (Examples)

2017-09-23

This article is a companion article to my another post Generalized Linear Model. In this article, I will implement some of the learning algorithms in Generalized Linear Model. To be more specific, I will do some examples on linear regression and logistic regression. With some effort, google search gives me some very good example data sets to work with. The datasets collected by Larry Winner is one of the excellent sets, which will be used in the article.

The implementations here use Python. Required 3rd party libraries are:

Requests: use to get online data
Matplotlib: use for plotting
NumPy: use for matrix calculation

LeetCode Contest 51

2017-09-23

这次比赛做得还算可以，只是最后一题看错题目WA了3次😭

第一题Baseball Game

给定一串字符串，每个字符串有四种可能：

整数：代表本轮得分
"+"：本轮得分为上两轮得分之和
"D"：本轮得分为上轮得分的两倍
"C"：上轮得分无效。

问最终得分之和。

class Solution(object):
    def calPoints(self, ops):
        """
        :type ops: List[str]
        :rtype: int
        """
        points = []
        for op in ops:
            if op == 'C':
                points.pop()
            elif op == 'D':
                points.append(points[-1]*2)
            elif op == '+':
                points.append(points[-1]+points[-2])
            else:
                points.append(int(op))
        return sum(points)

Generalized Linear Model

2017-09-20

This article on Generalized Linear Model (GLM) is based on the first four lectures of Machine Learning by Andrew Ng. But the structure of the article is quite different from the lecture. I will talk about exponential family of distributions first. Then I will discuss the general idea of GLM. Finally, I will try to derive some well known learning algorithms from GLM.

Exponential Family

A family of distributions is an exponential family if it can be parametrized by vector $\eta$ in the form $$P(y; \eta) = b(y)\exp(\eta^{T} T(y)-a(\eta)),$$ where $T(y)$ and $b(y)$ are (vector-valued) functions in terms of $y$, and $a(\eta)$ is a function in terms of $\eta$.

$\eta$ is called the natural parameter and $T(y)$ is called the sufficient statistic.

简单的python爬虫

2017-09-12

一个网络爬虫大致可以分成三个部分：获取网页，提取信息以及保存信息。Python有很多爬虫框架，其中最出名要数Scrapy了。这也是我唯一用过的Python爬虫框架，用起来很省心。让我苦恼的是，Scrapy在我的Raspberry Pi Zero W安装起来很麻烦，而且我觉得我爬取的网页比较容易处理，没有必要用这么重量级的框架。抱着学习的心态，我开始自己造轮子了。

在造轮子之前我找到些轻量级的框架，Sukhoi是我比较喜欢的一个。该库作者iogf使用了自己的异步库、网络库来写这个框架。这让我很佩服他。我写的框架受到了Sukhoi很大的启发与影响。

LeetCode Contest 49

2017-09-10

这次比赛对python不友好啊，第三题在比赛的时候一直TLE，比赛结束后用C++实现同样的算法就过了😭

第一题Longest Continuous Increasing Subsequence

给定一个数组，找出连续严格递增子数组的最大长度。

扫描一遍数组即可：如果nums[i], nums[i+1], ..., nums[j-1]是当前最长严格递增子数组，下次扫描的时候可以直接从j开始。

class Solution(object):
    def findLengthOfLCIS(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        ans = i = 0
        while i < len(nums):
            j = i + 1
            while j < len(nums) and nums[j] > nums[j-1]:
                j += 1
            ans = max(ans, j-i)
            i = j
        return ans

LeetCode Contest 48

2017-09-02

这次比赛有点简单的样子。

第一题Second Minimum Node In a Binary Tree

给定一棵特别的二叉树，这棵二叉树的节点要么有两个子节点，要么没有节点。如果有两个子节点，那么该节点的值为子节点的最小值。求树的第二最小值。

我当时什么都没有想，简单地将树的所有值取下来排序取出第二最小值。

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def findSecondMinimumValue(self, root):
        """
        :type root: TreeNode
        :rtype: int
        """
        nums = set()
        def dfs(root):
            if not root:
                return
            nums.add(root.val)
            dfs(root.left)
            dfs(root.right)
        dfs(root)
        return -1 if len(nums)<2 else sorted(list(nums))[1]

LeetCode Contest 47

2017-08-27

这次比赛做完四道题了 😊

第一题Non-decreasing Array

给定一个数组，问是否可以最多修改一个元素，使得整个数组非递降。

我是首先找出第一个i，满足nums[i] < nums[i-1]。这样我们必须要修改i或者i-1的位置，只要分别检查修改这两个位置是否可以得到一个非递降数组就行。因为两个元素相等也是非递降，所以修改一个位置可以等价于将那个位置的数字去除。

class Solution(object):
    def checkPossibility(self, nums):
        """
        :type nums: List[int]
        :rtype: bool
        """
        def check(i):
            if i >= len(nums):
                return True
            temp = [nums[j] for j in xrange(len(nums)) if j != i]
            return all(temp[j]>=temp[j-1] for j in xrange(1, len(temp)))
        i = 1
        while i < len(nums) and nums[i] >= nums[i-1]:
            i += 1
        return check(i) or check(i-1)

Previous 1 ... 4 5 6 7 8 ... 11 Next