Support Vector Machine
This article is my notes on support vector machine for Lecture 7 and 8 of Machine Learning by Andrew Ng.
Intuition
In a binary classification problem, we can use logistic regression
where \(g\) is the sigmoid function with a figure of it below.
Then given input \(x\), the model predicts \(1\) if and only if \(\theta^x \ge 0\), in which case \(h_\theta(x) = g(\theta^T x) \ge 0.5\); and it predicts \(0\) if and only if \(\theta^T x < 0\). Moreover, based on the shape of sigmoid function, if \(\theta^T x >> 0\), we are very confident that \(y=1\). Likewise, if \(\theta^T x << 0\), we are very confident that \(y=0\). Therefore, we hope that for the training set \(\{(x^{(i)}, y^{(i)})\}_{i=1}^m\), we can find such a \(\theta\) that \(\theta^T x^{(i)} >> 0\) if \(y^{(i)}=1\) and \(\theta^T x^{(i)} << 0\) if \(y^{(i)}=0\).