Pytorch 记录

1.pytorch中的索引 index_select(x, dim, indices)
dim代表维度，indices是筛选的索引序号,一般indices传入的是torch.LongTensor([1,2])这种类型。见pytorch的切片

2..data是从Variable获取底层Tensor的主要方式。

3.优化器Optimizer都实现了step()方法来对所有的参数进行更新。
Loss.backward()反向传播 #This call will compute the gradient of loss with respect to all Tensors with requires_grad=True.
Optimizer.step() 更新参数

4.torch.squeeze(input, dim=None, out=None)
将某一方向全部相同数据压缩掉，想想一下，x、y、z是三维的，如果z全为0，则可转化为二维。

torch.Tensor.view()会将原有数据重新分配为一个新的张量，同reshape()
如果有一个维度的大小用-1代替，那么该函数就会根据张量总元素的个数和其他个维度的元素个数自动计算这个用-1指定的维度的大小。

x = torch.randn(2, 4)
#1.5600 -1.6180 -2.0366  2.7115
 #0.8415 -1.0103 -0.4793  1.5734
#[torch.FloatTensor of size 2x4]
y = x.view(4,2)
print y

# 输出如下
 1.5600 -1.6180
-2.0366  2.7115
 0.8415 -1.0103
-0.4793  1.5734
[torch.FloatTensor of size 4x2]

5.和numpy之间的转换：from_numpy() numpy()
Pytorch变量类型转换

6.torch.nn.Embedding
官方文档说明：

Embedding

参考：word_embeddings_tutorial.以下是笔记以下是笔记以下是笔记（重要的话说三遍）
word embeddings are a representation of the semantics of a word, efficiently encoding semantic information that might be relevant to the task at hand
Central to the idea of deep learning is that the neural network learns representations of the features, rather than requiring the programmer to design them herself. So why not just let the word embeddings be parameters in our model, and then be updated during training? This is exactly what we will do.
Similar to how we defined a unique index for each word when making one-hot vectors, we also need to define an index for each word when using embeddings. These will be keys into a lookup table.
To index into this table, you must use torch.LongTensor (since the indices are integers, not floats).
我们在使用它的时候，需要建立一个语料中所有词的词典，使用one-hot形式表示所有的的词。
在实例化embedding类的对象时，需要传入的参数为the vocabulary size, and the dimensionality of the embeddings.(词典中的单词个数，嵌入的词的维度，padding_idx)，其中，如果给了padding_idx（int型）,那么就是使用padding_idx的值进行嵌入。
使用时，传入的input的值应该为 torch.LongTensor，1⃣️要么用torch.tensor()里指定dtype=torch.long2⃣️要么torch.LongTensor([张量])3⃣️要么可以使用data.long()转换一下.

2019.1.23踩坑：

Traceback (most recent call last):
  File "/Users/yumi/Documents/Code/pytorch_related/classifier/CNN+pooling/code/CNN.py", line 112, in <module>
    pred = model(inputs)
  File "/anaconda3/python.app/Contents/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/yumi/Documents/Code/pytorch_related/classifier/CNN+pooling/code/CNN.py", line 89, in forward
    out = self.embedded(x)
  File "/anaconda3/python.app/Contents/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/anaconda3/python.app/Contents/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/anaconda3/python.app/Contents/lib/python3.6/site-packages/torch/nn/functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.6/conda/conda-bld/pytorch_1544137972173/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191

出现这个的原因是我发现传入的inputs里存在负数，然后debug了一下以前正确的代码，我猜想这个地方传入的必须是非负数，因为建词典的时候肯定不存在indices为负数的情况。
（2019.01.26又出现了这个bug，原因是我词汇表的维度设置的太小了）

image.png

p.s.这个地方传入的inputs的维度是(batch_size,seq_len)

7.torch.nn.LSTM()
官方文档说明：

LSTM

参考：
LSTM:Pytorch实现
 Pytorch的LSTM的理解
 聊一聊PyTorch中LSTM的输出格式
参数：

input_size：x的特征维度
hidden_size：隐藏层的特征维度
num_layers：lstm隐层的层数，默认为1
bias：False则bih=0和bhh=0. 默认为True
batch_first：True则输入输出的数据格式为 (batch, seq, feature)
dropout：除最后一层，每一层的输出都进行dropout，默认为: 0
bidirectional：True则为双向lstm默认为False

输入数据格式

input(seq_len, batch, input_size)
seq_len体现序列的长度，也就是这串输入中明确的单词的个数。第二个维度表示如果希望一次在网络中走完整个序列可以设置为1，第三个维度是输入的向量的维度，一般是词嵌入的维度。
h0(num_layers * num_directions, batch, hidden_size)
c0(num_layers * num_directions, batch, hidden_size)

输出数据格式

output(seq_len, batch, hidden_size * num_directions)
output保存了最后一层，每个time step的输出h，如果是双向LSTM，每个time step的输出h = [h正向, h逆向] (同一个time step的正向和逆向的h连接起来)。
hn(num_layers * num_directions, batch, hidden_size)
h_n保存了每一层，最后一个time step的输出h，如果是双向LSTM，单独保存前向和后向的最后一个time step的输出h。
cn(num_layers * num_directions, batch, hidden_size)
output保存了最后一层每个time-step的输出，如果是双向LSTM，

我们会看到lstm(i.view(1,1,-1),hidden)这种形式，其实就是输入的数据形式应该为(seq_len, batch, input_size)这种三维的，hidden的输入也会有人写成(h0,c0)，然后再去初始化他们的值在这里我们是直接在一开始就将它们初始化并进行了拼接。

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)
lstm = nn.LSTM(3,3)
# 也就是seq_len是5,5个time-step
inputs = [torch.randn(1,3) for _ in range(5)]
# print(inputs)
# print(torch.randn(1,1,3))

# 也可以在类里自己定义初始化,维度是(num_layers * num_directions, batch, hidden_size)
hidden = (torch.randn(1,1,3),torch.randn(1,1,3))


# 逐个读入序列的元素
for i in inputs:
    # print(i.view(1,1,-1))
    out,hidden = lstm(i.view(1,1,-1),hidden)
    print("out\n",out)
    print("hidden\n",hidden)

#以下是直接读入整个序列,LSTM返回的第一个值表示所有时刻的隐状态值，第二个表示最近的隐状态值
# 所以下面的out_all和hidden_all是一样的
#out_all是最后一层每个time-step的输出值,这里我们只有一层LSTM。
# hidden_all的第一个张量h_n表示的是最后一个time_step的值
inputs = torch.cat(inputs).view(len(inputs),1,-1)
out_all,hidden_all = lstm(inputs,hidden)
print(out_all)
print(hidden_all)

8.产生随机数https://zhuanlan.zhihu.com/p/31231210

9.NLLLoss 的输入是一个对数概率向量和一个目标标签. 它不会为我们计算对数概率. 适合网络的最后一层是log_softmax. 损失函数 nn.CrossEntropyLoss() 与 NLLLoss() 相同, 唯一的不同是它为我们去做 softmax.

10.Pytorch数据读取(Dataset, DataLoader, DataLoaderIter)
参考：
Pytorch数据读取
 PyTorch源码解读之torch.utils.data.DataLoader
Dataset
DataLoader
DataLoaderIter
从上往下装入　
当代码运行到要从torch.utils.data.DataLoader类生成的对象中取数据的时候,就会调用DataLoader类的iter方法，iter方法就一行代码：return DataLoaderIter(self)，输入正是DataLoader类的属性。因此当调用iter方法的时候就牵扯到另外一个类：DataLoaderIter。

import torch
import torch.utils.data as Data
torch.manual_seed(1)

BATCH_SIZE = 5
x = torch.linspace(1,10,10)
y = torch.linspace(10,1,10)

torch_dataset = Data.TensorDataset(x,y)
loader = Data.DataLoader(
    dataset=torch_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True
)
# 将所有的数据训练了３次
for epoch in range(3):
    #总共有10条数据,batch_size是5,每个epoch训练两次
    for step,(batch_x,batch_y) in enumerate(loader):
        print("Epoch:",epoch,"|step:",step,"|batch x:",batch_x.numpy(),"|batch y:",batch_y.numpy())
print("\n")

结果：
Epoch: 0 |step: 0 |batch x: [ 5.  7. 10.  3.  4.] |batch y: [6. 4. 1. 8. 7.]
Epoch: 0 |step: 1 |batch x: [2. 1. 8. 9. 6.] |batch y: [ 9. 10.  3.  2.  5.]
Epoch: 1 |step: 0 |batch x: [ 4.  6.  7. 10.  8.] |batch y: [7. 5. 4. 1. 3.]
Epoch: 1 |step: 1 |batch x: [5. 3. 2. 1. 9.] |batch y: [ 6.  8.  9. 10.  2.]
Epoch: 2 |step: 0 |batch x: [ 4.  2.  5.  6. 10.] |batch y: [7. 9. 6. 5. 1.]
Epoch: 2 |step: 1 |batch x: [3. 9. 1. 8. 7.] |batch y: [ 8.  2. 10.  3.  4.]

# 如果batch_size是8,不够分成两次的，那第二次返回的就是剩下的数据
# eg.
torch_dataset2 = Data.TensorDataset(x,y)
loader2 = Data.DataLoader(
    dataset=torch_dataset2,
    batch_size= 8 ,
    shuffle=True
)
for epoch in range(3):
    # 总共有10条数据,batch_size是５,so training for 2 times
    for step, (batch_x, batch_y) in enumerate(loader2):
        print("Epoch:", epoch, "|step:", step, "|batch x:", batch_x.numpy(), "|batch y:", batch_y.numpy())

结果：
Epoch: 0 |step: 0 |batch x: [ 4. 10.  9.  8.  7.  6.  1.  2.] |batch y: [ 7.  1.  2.  3.  4.  5. 10.  9.]
Epoch: 0 |step: 1 |batch x: [5. 3.] |batch y: [6. 8.]
Epoch: 1 |step: 0 |batch x: [9. 8. 4. 6. 5. 3. 7. 2.] |batch y: [2. 3. 7. 5. 6. 8. 4. 9.]
Epoch: 1 |step: 1 |batch x: [10.  1.] |batch y: [ 1. 10.]
Epoch: 2 |step: 0 |batch x: [ 5.  1.  3.  7.  6. 10.  9.  8.] |batch y: [ 6. 10.  8.  4.  5.  1.  2.  3.]
Epoch: 2 |step: 1 |batch x: [2. 4.] |batch y: [9. 7.]

11.Autograd
from torch.autograd import Variable
Tensor在被封装为Variable后,可以调用.backward实现反向椽笔,自动计算所以梯度...Variable包含三个属性:data,grad,grad_fn

y = x.detach()正如其名, 将返回一个不参与计算图的Tensor y, Tensor y 一旦试图改变修改自己的data, 会被语法检查和python解释器监测到, 并抛出错误.
requires_grad直接挂在Tensor类下,也可以将requires_grad作为一个参数, 构造tensor
https://www.itency.com/topic/show.do?id=494122

# 可学习的参数可以通过net.parameters()查看
params = list(net.parameters())

# 同时返回可学习的参数和名称
for name,parameters in net.named_parameters():
    print(name,":",parameters.size())

Variable节点和Function节点，Variable记录运算数据，Function记录运算操作。其中Variable节点又可以分为叶节点和非叶节点两类。叶节点由用户直接创建产生，而非叶节点则由Variable节点之间的运算操作产生。
如果一个节点由用户创建，则它为叶节点，对应的grad_fn是None。
grad_fn可以查看这个variable的反向传播函数，grad_fn.next_functions[0][0]

12.pack_padded_sequence VS pad_packed_sequence

torch.nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first=False)
输入的input需要按长度降序排列，lengths需要从大到小排序。
input的维度：T x B x * ，T表示一个最长的数据长度(the length of the longest sequence)，B表示batch_size,如果batch_first=True,意味着维度是B x T x *
lengths:每个batch的数据长度（Expected len(lengths) to be equal to batch_size）

返回的数据是一个PackedSequence 对象。包含data和batch_size，其中的data是对input按列算的值，batch_size是这一列去掉padding的0之后的数量。
参考pytorch中如何处理RNN输入变长序列padding

排序的思路的话，可以先对lengths进行sort,然后再使用index_select对input进行排序。

# 降序方法1
sort_index = np.argsort(-np.array(lengths))

# 降序方法2
lengths = torch.tensor(lengths)
_, idx_sort = torch.sort(lengths, dim=0, descending=True)

可以参考pytorch里的pack_padded_sequence和pad_packed_sequence解析
 pytorch学习笔记(二十一): 使用 pack_padded_sequence

我的个人方法：

import numpy as np
import torch

data = [[1,1,1,1],[2,2,2,2,2],[3,3,3,3,3,3],[4,4,4,4,4,4,4]]
# print(type(data))
data_length = [len(item) for item in data]
max_len = np.max(data_length)
new_array = np.zeros((len(data), max_len))
for index, data in enumerate(data):
    # print(index,data)
    new_array[index][:len(data)] = data
data_tensor = torch.from_numpy(new_array)
_, indices = torch.sort(torch.tensor(data_length), descending=True)
# print(indices)
print(data_tensor[indices])

_, idx_sort = torch.sort(lengths, dim=0, descending=True)
x = torch.index_select(padding_tensor,0,idx_sort)
x_packed = nn.utils.rnn.pack_padded_sequence(input=x, lengths=lengths[idx_sort],batch_first=True)

把这个PackedSequence 对象送入网络中，生成出来的还是PackedSequence 对象，这时候就需要pad_packed_sequence啦，pad_packed_sequence是pack_padded_sequence的一个逆操作。
torch.nn.utils.rnn.pad_packed_sequence(sequence, batch_first=False, padding_value=0.0, total_length=None)
T x B x * 在batch_first=True的情况下返回B x T x *
返回的是一个tuple,第一维是👆
第二维是每条数据对应的长度。

13.eval即evaluation模式，train即训练模式。仅仅当模型中有Dropout和BatchNorm时才会有影响。因为训练时dropout和BN都开启，而一般而言测试时dropout被关闭，BN中的参数也是利用训练时保留的参数，所以测试时应进入评估模式

nn.Linear(in_features, out_features, bias=True)
参数：in_features是输入sample的维度，out_features是输出sample的维度。
输入：(N, *, in_features): where *∗ means any number of additional dimensions

image.png

image.png

在进行预测的时候，需要写到with torch.no_grad()里面。

16.torch.transpose(input, dim0, dim1)
返回输入矩阵input的转置，交换维度dim0和dim1。输入张量与输出张量共享内存。

image.png

17.torch.nn.functional.max_pool1d
输入维度：minibatch✖️in_channels✖️iT✖️iH✖️iW
根据torch.nn.MaxPool1d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
输入的维度是(N,C,Lin)，输出是(N,C,Lout)。
Lout的计算公式：

image.png

18.nn.CrossEntropyLoss()
input的第一项是y_pred，第二项是y_target，今天踩的坑是把它俩位置放反了，百思不得其解报错信息。官方文档是这么说的：

image.png

顺便附上这张图或许可以更好的理解输出的维度代表的意思：

image.png

19.torch.max(input, dim, keepdim=False, out=None)
input是个tensor
dim为1，表示按行选最大的
dim为0，表示按列选最大的。
返回值是(Tensor, LongTensor)，第一个是max的值，第二个是max的索引值.

怎么用？可以用来计算预测准确的数量：

#####二分类的输出#####
pred = torch.tensor([[0.1,-0.2],[0.3,0.5],[0.2,-0.1]])
label = torch.tensor([0,0,1])
#pred_label为预测的类别，label为标注的类别
pred_label = torch.max(pred,1)[1]
print(pred_label)
print(float((pred_label == label).sum())/len(label)*100)

20.2019.01.16今日踩坑：
取tensor中的数，忘记在哪里看过说如果用tensor.data获取其中的值并进行叠加的话会加到动态图中？？？
如果一个tensor只有一个元素，那么可以使用tensor.item()方法取出这个元素作为普通的python数字。如果tensor有多个元素，用tensor.tolist()方法可以取出。

nn.Conv1d & nn.Conv2d
nn.Conv1d:
主要参数：input_channel(看一个博客里说这个是词嵌入的维度), output_channel, kernel_size, stride, padding.
输入：(batch_size, num of input channels，文本长度）
输出：(batch_size, num of output channels(我理解的是用多少个卷积核去卷积),Lout(这里是用那个卷积的公式计算出来的))

# 16表示input_channel,是词嵌入的维度，33是out_channel，是用几个卷积核进行卷积，3是表示卷积核的大小，这里是（3*词嵌入的维度）
m = nn.Conv1d(16, 33, 3, stride=2)
# input2 = torch.randn()
# 输入：N*C*L:batch_size为20，C为词嵌入的维度，50为句子的length
# 输出：N*Cout*Lout：Cout我理解的是out_channel的数量
#要注意输入的维度是词嵌入在第二维，还是句子的length在第二维，不行就用permute()或者transpose()修改维度的顺序。
input2 = torch.randn(20, 16, 50)
output2 = m(input2)
print(output2.size())

22.2019.01.25今日踩坑：
contiguous()一般与transpose，permute,view搭配使用
即使用transpose或permute进行维度变换后，调用contiguous，然后方可使用view对维度进行变形。

Pytorch 记录

推荐阅读更多精彩内容