[npnet] 序言

很早之前我就有一个“写一个机器学习的系列笔记”的想法了。之所以叫做“笔记”而不叫“教程”,是因为机器学习只是我的业余爱好之一,我并不精通此道。事实上,我只是学到最简单的皮毛知识,还不一定明白透彻。所以我把我将要写的这个系列定义为笔记,并且在开篇就写得明明白白,这样我就可以随心所欲地写而并不担心误人子弟。不过既然我把笔记放在网上了,那表明我也希望其他人能从我写的东西里得到启发。如果你发现我的理解有误或者文章有错,也请务必留言让我知道。

Read More

编辑 Raspbian 镜像

这个笔记记录如何在 Raspbian 上编辑 Raspbian 镜像。为什么会有这个需要呢?第一是因为经常在 Raspberry Pi 上尝试各种各样的东西,会把系统弄得乱七八糟,所以经常要重新刷 Raspbian ,因此就有了编辑 Raspbian 镜像的需求。第二是我现在用的是 Mac OS ,编辑镜像文件并不那么方便,所以就干脆在 Raspbian 系统中编辑 Raspbian 镜像了。

Read More

2018, 在路上

2018年,一直在路上,特别是在I-71。I-71是一条跨州的高速公路,从南端肯塔基州的路易斯维尔(Louisville, Kentucky)东北延向俄亥俄州的克里夫兰(Cleveland, Ohio),全长约345迈。之前每逢夏天,便约上几个好友沿着I-71开车到克里夫兰,或是参观艺术博物馆,或是听音乐会,又或是去那个不甚似国家公园的国家公园骑自行车。到了今年,那些个好友也已经有好几个毕业离开了。虽然少了好友一起在I-71路上寻找悠闲,但是I-71对我来说有了它新的意义:它变成了李之仪的长江。

卜算子

我住长江尾,君住长江尾。
日日思君不见君,共饮长江水。
此水几时休,此恨何时已。
只愿君心似我心,定不负相思意。

Read More

Simple Grid Detection for Sudoku

Sudoku is a very interesting puzzle, and it is very easy to get familiar with the rules of the game. I always like to compare sudoku with number theory, they are easy to be understood but can be hard to solve. Fermat's Last Theorem, an easily understood problem even by middle school students, took mathematians 358 years to solve it completely. Solving sudoku puzzles can take even longer: it is proven that solving sudoku puzzels is an NP-complete problem! However, classical 9x9 sudoku puzzles are still feasible using backtracking algorithm. We are not going to talk about how to solve sudoku puzzles here though, we are trying to sovle another problem: detect grids in images containing sudoku puzzles.

To make the problem more concrete, let's make some assumption on the image that contains a sudoku puzzle:

  1. The image only contains one sudoku puzzle
  2. The sudoku puzzle is the largest block in the image
  3. There is no/little distorsion/rotation of the sudoku puzzle grid

Also in this article, we focus on only simple solutions without heavy image processing. We will use Pillow to read images and convert them to gray scale images, and we use Numpy to manipulate images as arrays. For visualization, we use Matplotlib.

Read More

Pytorch on RaspberryPi

Finally, I successfully installed Pytorch on my RaspberryPi 3B. I followed the instructions on How to install PyTorch v0.3.1 on RaspberryPi 3B and a blog post (in Chinese) 在 RaspberryPi 上编译 PyTorch. I am running latest raspian image 2018-04-18-raspbian-stretch and a self-compiled Python 3.6.5. If you are interested in compile Python 3.6.5 or above on your own, check out Installing Python 3.6 on Raspbian. In this post, I will write down steps to my successful installation, and share my compiled wheel file at the end.

Read More

Chapi

I recently built a flask app that run on my raspberry pi. Then I turned it down. Why? In order to access the app from outside the home internet, I had to expose my pi to public. After only one day of exposure, my pi received several attack attempts and the flask app then always replied 301. Here are some attack logs:

139.162.83.10 - "GET / HTTP/1.1" 404 -
45.33.115.189 - "GET / HTTP/1.0" 404 -
79.55.11.21 - "GET / HTTP/1.0" 404 -
5.101.40.78 - code 400, message Bad HTTP/0.9 request type ('\x03\x00\x00/*à\x00\x00\x00\x00\x00Cookie:')
5.101.40.78 - "/*àCookie: mstshash=Administr" HTTPStatus.BAD_REQUEST -
123.151.42.61 - "GET http://www.baidu.com/ HTTP/1.1" 404 -
158.85.81.116 - "GET / HTTP/1.1" 404 -

To be more precise, the flask app was still running, but the home router seemed not forwarding the requests to my pi. Anyway, I have no idea of what have happened. I just shut the app down, flashed a fresh new image to my pi.

I want a way to communicate with my pi without exposing it. That is the motivation I start the project chapi, which means "chat with pi".

Read More

一个“朴素”的推荐系统

Introduction

在这篇文章中,我将使用朴素贝叶斯分类器(Naive Bayes Classifier)打造一个简单的个人推荐系统:给定一篇中文文章(article),返回我喜欢(like)这篇文章的概率。用数学的语言来描述的话,我们要计算的是

$$P(like|aritcle)$$


根据贝叶斯公式,

$$P(like|aritcle) = \frac{P(article|like)P(like)}{P(article|like)P(like)+P(article|dislike)P(dislike)}$$

Read More

解密安卓微信聊天记录数据库

我一直想把自己的微信聊天记录导出来,用来做一些奇奇怪怪的东西。可以用来分析一下过去一年里跟谁聊天了,聊的都是啥,或者做一个带有自己特点的聊天机器人。问题是,微信现在并没有简单导出聊天记录的方法。从我网上搜到的资料,安卓机器上的微信会将聊天记录(还有很多其他数据,比如联系人)保存在一个叫做EnMicroMsg.db的数据库中。这个数据库是使用SQLCipher生成的,没有密码是无法正常打开的。这个密码其实可以很容易通过自己手机的信息以及微信的信息计算出来,我也根据网上的教程计算出属于我的密码了。但是我一直没有成功解密我的聊天记录数据库。直到今天我再次搜寻了关于微信数据库的资料,才发现我不仅仅需要密码,还需要其他一下数据库的设置才能正确打开。这篇博文主要的目的是记录我该怎么做才能导出微信聊天记录,以方便以后使用。

Read More

Les misérables

昨天跟永佳在路村看了音乐剧《悲惨世界》,演员细腻的表演与出色舞台灯光效果让我对舞台剧的热爱更深一分。

剧院终场

Read More

Digit recognition, CNN

最近看了一些关于卷积网络(Convolution Neural Network)的内容,我想用mnist数据集重复一下网上的教程,好让自己能更好地理解并使用卷积网络。比较了一下TensorflowPytorch,个人比较喜欢Pytorch,所以以后基本就用它了。Pytorch里是包含获取mnist数据集的api的,使用这api我们可以很简单就能准备好数据。但是出于学习Pytorch的目的,我决定自己准备数据。

之前我在另一篇博文用了广义线性模型Softmax来识别这些手写数字,错误率最低达到了7.77%。使用图像识别领域上常用有效的卷积网络,我们会得到更低的错误率。这点会在本文后面看到。这篇博文不会用太多的文字解释,基本上都是代码。

Read More