李航统计学习方法（二）-感知机算法

感知机算法

《统计学习方法》系列笔记的第二篇，对应原著第二章。大量引用原著讲解，加入了自己的理解。对书中算法采用Python实现，并用Matplotlib可视化了动画出来，应该算是很硬派了。一套干货下来，很是辛苦，要是能坚持下去就好。

概念

感知机是二分类模型，输入实例的特征向量，输出实例的±类别。

感知机模型

定义

假设输入空间是

输出空间是

image

x和y分属这两个空间，那么由输入空间到输出空间的如下函数：

image

称为感知机。其中，w和b称为感知机模型参数，

image

叫做权值或权值向量，

image

叫做偏置，w·x表示向量w和x的内积。sign是一个函数：

image

感知机的几何解释是，线性方程

image

将特征空间划分为正负两个部分：

image

这个平面（2维时退化为直线）称为分离超平面。

感知机学习策略

数据集的线性可分性

定义

给定数据集

image

其中

image

如果存在某个超平面S

image

能够完全正确地将正负实例点全部分割开来，则称T线性可分，否则称T线性不可分

感知机学习策略

假定数据集线性可分，我们希望找到一个合理的损失函数。

一个朴素的想法是采用误分类点的总数，但是这样的损失函数不是参数w，b的连续可导函数，不可导自然不能把握函数的变化，也就不易优化（不知道什么时候该终止训练，或终止的时机不是最优的）。

另一个想法是选择所有误分类点到超平面S的总距离。为此，先定义点x₀到平面S的距离：

image

分母

image

是w的L₂范数，所谓L₂范数，指的是向量各元素的平方和然后求平方根（长度）。这个式子很好理解，回忆中学学过的点到平面的距离：

image

此处的点到超平面S的距离的几何意义就是上述距离在多维空间的推广。

又因为，如果点i被误分类，一定有

image

成立，所以我们去掉了绝对值符号，得到误分类点到超平面S的距离公式：

image

假设所有误分类点构成集合M，那么所有误分类点到超平面S的总距离为

image

分母作用不大，反正一定是正的，不考虑分母，就得到了感知机学习的损失函数：

image

感知机学习算法

原始形式

感知机学习算法是对以下最优化问题的算法：

image

感知机学习算法是误分类驱动的，先随机选取一个超平面，然后用梯度下降法不断极小化上述损失函数。损失函数的梯度由：

image

给出。所谓梯度，是一个向量，指向的是标量场增长最快的方向，长度是最大变化率。所谓标量场，指的是空间中任意一个点的属性都可以用一个标量表示的场（个人理解该标量为函数的输出）。

随机选一个误分类点i，对参数w，b进行更新：

image

上式

image

是学习率。损失函数的参数加上梯度上升的反方向，于是就梯度下降了。所以，上述迭代可以使损失函数不断减小，直到为0。于是得到了原始形式的感知机学习算法：

image

对于此算法，使用下面的例子作为测试数据：

image

给出Python实现和可视化代码如下：
终于到了最激动人心的时刻了，有了上述知识，就可以完美地可视化这个简单的算法：

    import copy
    from matplotlib import pyplot as plt
    from matplotlib import animation
     
    training_set = [[(3, 3), 1], [(4, 3), 1], [(1, 1), -1]]
    w = [0, 0]
    b = 0
    history = []
     
     
    def update(item):
        """
        update parameters using stochastic gradient descent
        :param item: an item which is classified into wrong class
        :return: nothing
        """
        global w, b, history
        w[0] += 1 * item[1] * item[0][0]
        w[1] += 1 * item[1] * item[0][1]
        b += 1 * item[1]
        print w, b
        history.append([copy.copy(w), b])
        # you can uncomment this line to check the process of stochastic gradient descent
     
     
    def cal(item):
        """
        calculate the functional distance between 'item' an the dicision surface. output yi(w*xi+b).
        :param item:
        :return:
        """
        res = 0
        for i in range(len(item[0])):
            res += item[0][i] * w[i]
        res += b
        res *= item[1]
        return res
     
     
    def check():
        """
        check if the hyperplane can classify the examples correctly
        :return: true if it can
        """
        flag = False
        for item in training_set:
            if cal(item) <= 0:
                flag = True
                update(item)
        # draw a graph to show the process
        if not flag:
            print "RESULT: w: " + str(w) + " b: " + str(b)
        return flag
     
     
    if __name__ == "__main__":
        for i in range(1000):
            if not check(): break
     
        # first set up the figure, the axis, and the plot element we want to animate
        fig = plt.figure()
        ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
        line, = ax.plot([], [], 'g', lw=2)
        label = ax.text([], [], '')
     
        # initialization function: plot the background of each frame
        def init():
            line.set_data([], [])
            x, y, x_, y_ = [], [], [], []
            for p in training_set:
                if p[1] > 0:
                    x.append(p[0][0])
                    y.append(p[0][1])
                else:
                    x_.append(p[0][0])
                    y_.append(p[0][1])
     
            plt.plot(x, y, 'bo', x_, y_, 'rx')
            plt.axis([-6, 6, -6, 6])
            plt.grid(True)
            plt.xlabel('x')
            plt.ylabel('y')
            plt.title('Perceptron Algorithm (www.hankcs.com)')
            return line, label
     
     
        # animation function.  this is called sequentially
        def animate(i):
            global history, ax, line, label
     
            w = history[i][0]
            b = history[i][1]
            if w[1] == 0: return line, label
            x1 = -7
            y1 = -(b + w[0] * x1) / w[1]
            x2 = 7
            y2 = -(b + w[0] * x2) / w[1]
            line.set_data([x1, x2], [y1, y2])
            x1 = 0
            y1 = -(b + w[0] * x1) / w[1]
            label.set_text(history[i])
            label.set_position([x1, y1])
            return line, label
     
        # call the animator.  blit=true means only re-draw the parts that have changed.
        print history
        anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
                                       blit=True)
        plt.show()
        anim.save('perceptron.gif', fps=2, writer='imagemagick')

可视化

image.png

可见超平面被误分类点所吸引，朝着它移动，使得两者距离逐步减小，直到正确分类为止。通过这个动画，是不是对感知机的梯度下降算法有了更直观的感悟呢？

算法的收敛性

记输入向量加进常数1的拓充形式

image

，其最大长度为

image

，记感知机的参数向量

image

，设满足条件

image

的超平面可以将数据集完全正确地分类，定义最小值伽马：

image

感知机学习算法的对偶形式

对偶指的是，将w和b表示为测试数据i的线性组合形式，通过求解系数得到w和b。具体说来，如果对误分类点i逐步修改wb修改了n次，则w，b关于i的增量分别为

image

，这里

image

，则最终求解到的参数分别表示为：

image

于是有算法2.2：

image

则误分类次数k满足：

image

证明请参考《统计学习方法》P31。

感知机对偶算法代码

涉及到比较多的矩阵计算，于是用NumPy比较多：

    # -*- coding:utf-8 -*-
    # Filename: train2.2.py
    # Author：hankcs
    # Date: 2015/1/31 15:15
    import numpy as np
    from matplotlib import pyplot as plt
    from matplotlib import animation
     
    # An example in that book, the training set and parameters' sizes are fixed
    training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1]])
     
    a = np.zeros(len(training_set), np.float)
    b = 0.0
    Gram = None
    y = np.array(training_set[:, 1])
    x = np.empty((len(training_set), 2), np.float)
    for i in range(len(training_set)):
        x[i] = training_set[i][0]
    history = []
     
    def cal_gram():
        """
        calculate the Gram matrix
        :return:
        """
        g = np.empty((len(training_set), len(training_set)), np.int)
        for i in range(len(training_set)):
            for j in range(len(training_set)):
                g[i][j] = np.dot(training_set[i][0], training_set[j][0])
        return g
     
     
    def update(i):
        """
        update parameters using stochastic gradient descent
        :param i:
        :return:
        """
        global a, b
        a[i] += 1
        b = b + y[i]
        history.append([np.dot(a * y, x), b])
        # print a, b # you can uncomment this line to check the process of stochastic gradient descent
     
     
    # calculate the judge condition
    def cal(i):
        global a, b, x, y
     
        res = np.dot(a * y, Gram[i])
        res = (res + b) * y[i]
        return res
     
     
    # check if the hyperplane can classify the examples correctly
    def check():
        global a, b, x, y
        flag = False
        for i in range(len(training_set)):
            if cal(i) <= 0:
                flag = True
                update(i)
        if not flag:
     
            w = np.dot(a * y, x)
            print "RESULT: w: " + str(w) + " b: " + str(b)
            return False
        return True
     
     
    if __name__ == "__main__":
        Gram = cal_gram()  # initialize the Gram matrix
        for i in range(1000):
            if not check(): break
     
        # draw an animation to show how it works, the data comes from history
        # first set up the figure, the axis, and the plot element we want to animate
        fig = plt.figure()
        ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
        line, = ax.plot([], [], 'g', lw=2)
        label = ax.text([], [], '')
     
        # initialization function: plot the background of each frame
        def init():
            line.set_data([], [])
            x, y, x_, y_ = [], [], [], []
            for p in training_set:
                if p[1] > 0:
                    x.append(p[0][0])
                    y.append(p[0][1])
                else:
                    x_.append(p[0][0])
                    y_.append(p[0][1])
     
            plt.plot(x, y, 'bo', x_, y_, 'rx')
            plt.axis([-6, 6, -6, 6])
            plt.grid(True)
            plt.xlabel('x')
            plt.ylabel('y')
            plt.title('Perceptron Algorithm 2 (www.hankcs.com)')
            return line, label
     
     
        # animation function.  this is called sequentially
        def animate(i):
            global history, ax, line, label
     
            w = history[i][0]
            b = history[i][1]
            if w[1] == 0: return line, label
            x1 = -7.0
            y1 = -(b + w[0] * x1) / w[1]
            x2 = 7.0
            y2 = -(b + w[0] * x2) / w[1]
            line.set_data([x1, x2], [y1, y2])
            x1 = 0.0
            y1 = -(b + w[0] * x1) / w[1]
            label.set_text(str(history[i][0]) + ' ' + str(b))
            label.set_position([x1, y1])
            return line, label
     
        # call the animator.  blit=true means only re-draw the parts that have changed.
        anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
                                       blit=True)
        plt.show()
        # anim.save('perceptron2.gif', fps=2, writer='imagemagick')

可视化

image

与算法1的结果相同，我们也可以将数据集改一下：

training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1], [[5, 2], -1]])

</pre>

会得到一个复杂一些的结果：

image

读后感

通过最简单的模型，学习到ML中的常用概念和常见流程。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 158,560评论 4赞 361
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 67,104评论 1赞 291
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 108,297评论 0赞 243
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 43,869评论 0赞 204
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 52,275评论 3赞 287
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 40,563评论 1赞 216
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 31,833评论 2赞 312
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 30,543评论 0赞 197
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 34,245评论 1赞 241
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 30,512评论 2赞 244
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 32,011评论 1赞 258
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 28,359评论 2赞 253
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 33,006评论 3赞 235
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 26,062评论 0赞 8
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 26,825评论 0赞 194
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 35,590评论 2赞 273
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 35,501评论 2赞 268

李航统计学习方法（二）-感知机算法

感知机算法

概念

感知机模型

感知机学习策略

感知机学习算法

原始形式

可视化

算法的收敛性

感知机学习算法的对偶形式

可视化

读后感

推荐阅读更多精彩内容