在Pytorch下搭建BiLSTM(Reproducible/Deterministic)

什么是LSTM

如果还不知道什么是LSTM ，请移步
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
我第一眼看到LSTM时，还在感概这个网络怎么这多参数。其实接触多了，发现LSTM的精髓就在于3个门，forget，input和output，围绕这3个门的公式也是基本相似，所以记忆LSTM的公式其实相当简单。

为什么要用LSTM

因为简单的RNN很容易就发生梯度消失和梯度爆炸，其中主要的原因是RNN中求导，引起的链式法则，对时间上的追溯，很容易发生系数矩阵的累乘，矩阵元素大于1，那么就会发生梯度爆炸；矩阵元素小于1，就会发生梯度消失。
LSTM通过门的控制，可以有效的防止梯度消失，（敲黑板！！！）但是依旧可能出现梯度爆炸的问题，所以训练LSTM会加入梯度裁剪（Gradient Clipping）。在Pytorch中梯度裁剪可以使用

import torch.nn as nn
nn.utils.clip_grad_norm(filter(lambda p:p.requires_grad,model.parameters()),max_norm=max_norm)

在以下的代码中我不会使用梯度裁剪操作，大家如果有需要可以自己添加以上代码。关于梯度消失和梯度爆炸的具体原因分析可以移步
http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/readings/L15%20Exploding%20and%20Vanishing%20Gradients.pdf

为什么要用BiLSTM

Bi代表双向。其实使用BiLSTM还是蛮有争议，因为人类理解时序信号的默认顺序其实是时间流逝的顺序，那么将时间倒叙的信号还有没有意义？有人说有，譬如说看一个人写一个字的具体笔画顺序其实不影响我们猜测这个字（这个例子其实是我瞎举的）；有人说没有，倒着听一个人说话就不行。不管有什么争议，但是架不住BiLSTM在实际应用中效果十有八九好于LSTM，所以就用吧。
具体双向LSTM的结构其实相当简单，就是两个单向LSTM各自沿着时间和网络层向前传播，然后最后的输出拼接在一起。

不如先搭建一个BiLSTM，为了分类任务

先定义几个符号

$B$ 代表batch size，
$L_i$ 代表在batch中第 $i$ 个序列的长度， $L\in R^B$ 是一个长度为 $B$ 的向量
$x(i,0:L_i,0:d_{input})$ 代表在batch中第 $i$ 个序列，其长度为 $L_i$ ，每一帧的维度是 $d_{input}$ ；每一个batch的数据 $x$ 的矩阵大小为 $x\in R^{B\times L_{max}\times d_{input}}$ ，其中 $L_{max}$ 是序列 $L$ 中的最大值，对于长度不足 $L_{max}$ 事先应进行补0操作
$y(i,0:L_i)$ 代表在batch中第 $i$ 个序列的类别，每一个batch的数据 $y$ 的矩阵大小为 $y\in R^{B\times L_{max}}$ ，其中 $L_{max}$ 是序列 $L$ 中的最大值，对于长度不足 $L_{max}$ 事先应进行补 $-1$ 操作（避免和0混淆，其实补什么都无所谓，这里只是为了区分）

在这里，我将先使用Pytorch的原生API，搭建一个BiLSTM。先吐槽一下Pytorch对可变长序列处理的复杂程度。处理序列的基本步骤如下：

准备torch.Tensor格式的data= $x$ ，label= $y$ ，length= $L$ ，等等
数据根据length排序，由函数sort_batch完成
pack_padded_sequence操作
输入到lstm中进行训练

函数sort_batch

def sort_batch(data,label,length):
    batch_size=data.size(0)
    # 先将数据转化为numpy()，再得到排序的index
    inx=torch.from_numpy(np.argsort(length.numpy())[::-1].copy())
    data=data[inx]
    label=label[inx]
    length=length[inx]
    # length转化为了list格式，不再使用torch.Tensor格式
    length=list(length.numpy())
    return (data,label,length)

网络

class Net(nn.Module):
    def __init__(self,input_dim,hidden_dim,output_dim,num_layers,biFlag,dropout=0.5):
        # input_dim 输入特征维度d_input
        # hidden_dim 隐藏层的大小
        # output_dim 输出层的大小（分类的类别数）
        # num_layers LSTM隐藏层的层数
        # biFlag 是否使用双向
        super(Net,self).__init__()
        self.input_dim=input_dim
        self.hidden_dim=hidden_dim
        self.output_dim=output_dim
        self.num_layers=num_layers
        if(biFlag):self.bi_num=2
        else:self.bi_num=1
        self.biFlag=biFlag
        # 根据需要修改device
        self.device=torch.device("cuda")

        # 定义LSTM网络的输入，输出，层数，是否batch_first，dropout比例，是否双向
        self.layer1=nn.LSTM(input_size=input_dim,hidden_size=hidden_dim, \
                        num_layers=num_layers,batch_first=True, \
                        dropout=dropout,bidirectional=biFlag)
        # 定义线性分类层，使用logsoftmax输出
        self.layer2=nn.Sequential(
            nn.Linear(hidden_dim*self.bi_num,output_dim),
            nn.LogSoftmax(dim=2)
        )
        
        self.to(self.device)

    def init_hidden(self,batch_size):
        # 定义初始的hidden state
        return (torch.zeros(self.num_layers*self.bi_num,batch_size,self.hidden_dim).to(self.device),
                torch.zeros(self.num_layers*self.bi_num,batch_size,self.hidden_dim).to(self.device))
    def forward(self,x,y,length):
        # 输入原始数据x，标签y，以及长度length
        # 准备
        batch_size=x.size(0)
        max_length=torch.max(length)
        # 根据最大长度截断
        x=x[:,0:max_length,:];y=y[:,0:max_length]
        x,y,length=sort_batch(x,y,length)
        x,y=x.to(self.device),y.to(self.device)
        # pack sequence
        x=pack_padded_sequence(x,length,batch_first=True)

        # run the network
        hidden1=self.init_hidden(batch_size)
        out,hidden1=self.layer1(x,hidden1)
        # out,_=self.layerLSTM(x) is also ok if you don't want to refer to hidden state
        # unpack sequence
        out,length=pad_packed_sequence(out,batch_first=True)
        out=self.layer2(out)
        # 返回正确的标签，预测标签，以及长度向量
        return y,out,length

官方的BiLSTM有缺陷

以上的代码看似没问题了，实际上却有一个无法容忍的问题就是non-reproducible。也就是这个双向LSTM，每次出现的结果会有不同（在固定所有随机种子后）。老实说，这对科研狗是致命的。所以reproducible其实是我对模型最最基本的要求。

根据实验，以下情况下LSTM是non-reproducible，

使用nn.LSTM中的bidirectional=True，且dropout>0

根据实验，以下情况下LSTM是reproducible，

使用nn.LSTM中的bidirectional=True，且dropout=0
使用nn.LSTM中的bidirectional=False

也就是说双向LSTM在加上dropout操作后，会导致non-reproducible，据说这是Cudnn的一个问题，Pytorch无法解决，具体可见
https://discuss.pytorch.org/t/non-deterministic-result-on-multi-layer-lstm-with-dropout/9700
https://github.com/soumith/cudnn.torch/issues/197

作为一个强迫症，显然无法容忍non-reproducible。所幸单向的LSTM是reproducible，所以只能自己搭建一个双向的LSTM

自己动手丰衣足食

这里要引入一个新的函数reverse_padded_sequence，作用是将序列反向（可以理解为将batch $x\in R^{B\times L_{max}\times d_{input}}$ 的第二个维度 $L$ 反向，但是补零的地方不反向，作用同tensorflow中的tf.reverse_sequence函数一致）

import torch
from torch.autograd import Variable

def reverse_padded_sequence(inputs, lengths, batch_first=True):
    '''这个函数输入是Variable，在Pytorch0.4.0中取消了Variable，输入tensor即可
    '''
    """Reverses sequences according to their lengths.
    Inputs should have size ``T x B x *`` if ``batch_first`` is False, or
    ``B x T x *`` if True. T is the length of the longest sequence (or larger),
    B is the batch size, and * is any number of dimensions (including 0).
    Arguments:
        inputs (Variable): padded batch of variable length sequences.
        lengths (list[int]): list of sequence lengths
        batch_first (bool, optional): if True, inputs should be B x T x *.
    Returns:
        A Variable with the same size as inputs, but with each sequence
        reversed according to its length.
    """
    if batch_first:
        inputs = inputs.transpose(0, 1)
    max_length, batch_size = inputs.size(0), inputs.size(1)
    if len(lengths) != batch_size:
        raise ValueError("inputs is incompatible with lengths.")
    ind = [list(reversed(range(0, length))) + list(range(length, max_length))
           for length in lengths]
    ind = torch.LongTensor(ind).transpose(0, 1)
    for dim in range(2, inputs.dim()):
        ind = ind.unsqueeze(dim)
    ind = Variable(ind.expand_as(inputs))
    if inputs.is_cuda:
        ind = ind.cuda(inputs.get_device())
    reversed_inputs = torch.gather(inputs, 0, ind)
    if batch_first:
        reversed_inputs = reversed_inputs.transpose(0, 1)
    return reversed_inputs

接下来就是手动搭建双向LSTM的网络，和之前基本类似

class Net(nn.Module):
    def __init__(self,input_dim,hidden_dim,output_dim,num_layers,biFlag,dropout=0.5):
        super(Net,self).__init__()
        self.input_dim=input_dim
        self.hidden_dim=hidden_dim
        self.output_dim=output_dim
        self.num_layers=num_layers
        if(biFlag):self.bi_num=2
        else:self.bi_num=1
        self.biFlag=biFlag

        self.layer1=nn.ModuleList()
        self.layer1.append(nn.LSTM(input_size=input_dim,hidden_size=hidden_dim, \
                        num_layers=num_layers,batch_first=True, \
                        dropout=dropout,bidirectional=0))
        if(biFlag):
        # 如果是双向，额外加入逆向层
                self.layer1.append(nn.LSTM(input_size=input_dim,hidden_size=hidden_dim, \
                        num_layers=num_layers,batch_first=True, \
                        dropout=dropout,bidirectional=0))


        self.layer2=nn.Sequential(
            nn.Linear(hidden_dim*self.bi_num,output_dim),
            nn.LogSoftmax(dim=2)
        )

        self.to(self.device)

    def init_hidden(self,batch_size):
        return (torch.zeros(self.num_layers*self.bi_num,batch_size,self.hidden_dim).to(self.device),
                torch.zeros(self.num_layers*self.bi_num,batch_size,self.hidden_dim).to(self.device))
    

    def forward(self,x,y,length):
        batch_size=x.size(0)
        max_length=torch.max(length)
        x=x[:,0:max_length,:];y=y[:,0:max_length]
        x,y,length=sort_batch(x,y,length)
        x,y=x.to(self.device),y.to(self.device)
        hidden=[ self.init_hidden(batch_size) for l in range(self.bi_num)]

        out=[x,reverse_padded_sequence(x,length,batch_first=True)]
        for l in range(self.bi_num):
            # pack sequence
            out[l]=pack_padded_sequence(out[l],length,batch_first=True)
            out[l],hidden[l]=self.layer1[l](out[l],hidden[l])
            # unpack
            out[l],_=pad_packed_sequence(out[l],batch_first=True)
            # 如果是逆向层，需要额外将输出翻过来
            if(l==1):out[l]=reverse_padded_sequence(out[l],length,batch_first=True)
    
        if(self.bi_num==1):out=out[0]
        else:out=torch.cat(out,2)
        out=self.layer2(out)
        out=torch.squeeze(out)
        return y,out,length

大功告成，实测此网络reproducible

Appendix

固定Pytorch中的随机种子

import torch
import numpy as np
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 158,560评论 4赞 361
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 67,104评论 1赞 291
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 108,297评论 0赞 243
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 43,869评论 0赞 204
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 52,275评论 3赞 287
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 40,563评论 1赞 216
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 31,833评论 2赞 312
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 30,543评论 0赞 197
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 34,245评论 1赞 241
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 30,512评论 2赞 244
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 32,011评论 1赞 258
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 28,359评论 2赞 253
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 33,006评论 3赞 235
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 26,062评论 0赞 8
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 26,825评论 0赞 194
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 35,590评论 2赞 273
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 35,501评论 2赞 268