我们将用上一篇笔记实现的模型来搭建卷积神经网络去识别 Fashion MNIST 数据集。在之前的一篇笔记中我们用两个线性模型的串型网络达到了85%左右的准确率。我们期待使用卷积神经网络能达到更好的准确率。

import numpy as np
import matplotlib.pyplot as plt

%run import_npnet.py
npnet = import_npnet(6)

我们加载数据之后要将图片数据转成四维数组以作为 Conv 的输入。

train_images, train_labels = npnet.load_fashion_mnist('train')
test_images, test_labels = npnet.load_fashion_mnist('t10k')
train_images = train_images.reshape(-1, 1, 28, 28)
test_images = test_images.reshape(-1, 1, 28, 28)

我们第一个神经网络有两个卷积层 (convolution layer)、两个最大池化层 (max pooling) 以及一个线性层 (linear layer) 组成,激活函数 (activation function) 使用的是 npnet.ReLU。这里有个问题: 卷积层的输出是一个四维数组,而线性层只接受二维数组作为输入。我们需要一个中间件 (middleware) 将卷积层输出的四维数组转化成二维数组作为线性层的输入,并将从线性层传播过来的二维偏导数数组还原成四维数组向卷积层传播。

使用 numpy.reshape 我们能得到中间件的如下实现:

# import
class Flatten(npnet.Model):

    def forward(self, input):
        self.input_shape = input.shape
        return input.reshape(input.shape[0], -1)

    def backward(self, grad):
        return grad.reshape(self.input_shape)

验证一下偏导数:

npnet.GradientCheck(Flatten(), input_shape=(10, 3, 28, 28)).check()
True

现在我们在卷积层到线性层之间加入中间件 Flatten 就可以将卷积层和线性层接起来了。除了模型建立不一样之外,其他都跟以往笔记中训练串型网络一样。

%%time
# 模型
layers = [
    npnet.Conv(1, 8, kernel=3, stride=1, padding=1),
    npnet.MaxPool(kernel=2, stride=2, padding=0),
    npnet.ReLU(),
    npnet.Conv(8, 16, kernel=3, stride=1, padding=1),
    npnet.MaxPool(kernel=2, stride=2, padding=0),
    npnet.ReLU(),
    Flatten(),
    npnet.Linear(16 * 49, 10)
]
model = npnet.Sequential(layers)

# 超参数 (hyperparameters)
lr = 1e-4
batch_size = 10
epoch = 20
weight_decay = 3e-2 # 1e-2

# 标准、优化器
criterion = npnet.CrossEntropy()
optim = npnet.Adam(model, lr=lr, weight_decay=weight_decay)

train_loss, train_accuracy, test_loss, test_accuracy = \
    npnet.train_fashion_mnist(
        model=model, criterion=criterion, optim=optim,
        train_images=train_images, train_labels=train_labels,
        batch_size=batch_size, epoch=epoch,
        test_images=test_images, test_labels=test_labels,
        profile=True
    )
npnet:12: RuntimeWarning: divide by zero encountered in log
CPU times: user 1h 27min 22s, sys: 50.9 s, total: 1h 28min 13s
Wall time: 1h 29min 20s

训练卷积网络相对于之前只有两个线性模型的串型网络(50个 epoch 大概只需5分钟)要慢上许多。主要的时间耗在了卷积层的运算中了,我们用了约1个半小时的时间只能跑20个 epoch。我们看一下模型的效果。

npnet.plot_loss_and_accuracy(train_loss, train_accuracy,
                             test_loss, test_accuracy)

print(train_accuracy[-1], test_accuracy[-1])
0.90595 0.8881

模型的(测试)准确率提高到了88%左右。为了得到更好的结果,我们在上述模型中增加一个线性模型:

%%time
# 模型
layers = [
    npnet.Conv(1, 8, kernel=3, stride=1, padding=1),
    npnet.MaxPool(kernel=2, stride=2, padding=0),
    npnet.ReLU(),
    npnet.Conv(8, 16, kernel=3, stride=1, padding=1),
    npnet.MaxPool(kernel=2, stride=2, padding=0),
    npnet.ReLU(),
    Flatten(),
    npnet.Linear(16 * 49, 100),
    npnet.ReLU(),
    npnet.Linear(100, 10)
]
model = npnet.Sequential(layers)

# 超参数 (hyperparameters)
lr = 1e-4
batch_size = 10
epoch = 20
weight_decay = 4e-2 # 2e-2

# 标准、优化器
criterion = npnet.CrossEntropy()
optim = npnet.Adam(model, lr=lr, weight_decay=weight_decay)

train_loss, train_accuracy, test_loss, test_accuracy = \
    npnet.train_fashion_mnist(
        model=model, criterion=criterion, optim=optim,
        train_images=train_images, train_labels=train_labels,
        batch_size=batch_size, epoch=epoch,
        test_images=test_images, test_labels=test_labels,
        profile=True
    )
npnet:12: RuntimeWarning: divide by zero encountered in log
CPU times: user 1h 30min 18s, sys: 53.2 s, total: 1h 31min 11s
Wall time: 1h 32min 1s

模型结果:

npnet.plot_loss_and_accuracy(train_loss, train_accuracy,
                             test_loss, test_accuracy)

print(train_accuracy[-1], test_accuracy[-1])
0.9075333333333333 0.8945

我们发现增加一个线性模型并没有显著增加训练时间,这进一步说明训练时间主要花费在了卷积层的运算中。因为模型变得复杂了,所以我们相应地提高了 L2 正则的系数 weight_decay 以防止过拟合 (overfitting)。