PyTorch 深度学习: 60分钟快速入门

• 深入理解PyTorch张量库和神经网络
• 训练一个小的神经网络来分类图片

一、PyTorch 是什么

• 为了使用GPU来替代numpy
• 一个深度学习援救平台：提供最大的灵活性和速度

开始

张量（Tensors)

``````from __future__ import print_function
import torch
``````

``````x = torch.Tensor(5, 3)
print(x)
``````

``````1.00000e-10 *
-1.1314  0.0000 -1.1314
0.0000  0.0000  0.0000
0.0000  0.0000  0.0000
0.0000  0.0000  0.0000
0.0000  0.0000  0.0000
[torch.FloatTensor of size 5x3]
``````

``````x = torch.rand(5, 3)
print(x)
``````

``````0.2836  0.6710  0.5146
0.8842  0.2821  0.7768
0.3409  0.0428  0.6726
0.1982  0.6950  0.6040
0.0272  0.6586  0.3555
[torch.FloatTensor of size 5x3]
``````

``````print(x.size())
``````

``````torch.Size([5, 3])
``````

`torch.Size`实际上是一个元组，所以它支持元组相同的操作。

操作

``````y = torch.rand(5, 3)
print(x + y)
``````

``````0.9842  1.5171  0.8148
1.1334  1.6540  1.5739
0.9804  1.1647  0.4759
0.6232  0.2689  1.0596
1.0777  1.1705  0.3206
[torch.FloatTensor of size 5x3]
``````

``````print(torch.add(x, y))
``````

``````0.9842  1.5171  0.8148
1.1334  1.6540  1.5739
0.9804  1.1647  0.4759
0.6232  0.2689  1.0596
1.0777  1.1705  0.3206
[torch.FloatTensor of size 5x3]
``````

``````result = torch.Tensor(5, 3)
print(result)
``````

``````0.9842  1.5171  0.8148
1.1334  1.6540  1.5739
0.9804  1.1647  0.4759
0.6232  0.2689  1.0596
1.0777  1.1705  0.3206
[torch.FloatTensor of size 5x3]
``````

``````# 把x加到y上
print(y)
``````

``````0.9842  1.5171  0.8148
1.1334  1.6540  1.5739
0.9804  1.1647  0.4759
0.6232  0.2689  1.0596
1.0777  1.1705  0.3206
[torch.FloatTensor of size 5x3]
``````

``````print(x[:, 1])
``````

``````1.5171
1.6540
1.1647
0.2689
1.1705
[torch.FloatTensor of size 5]
``````

numpy桥

Torch张量和numpy数组将共享潜在的内存，改变其中一个也将改变另一个。

把Torch张量转换为numpy数组

``````a = torch.ones(5)
print(a)
``````

``````1
1
1
1
1
[torch.FloatTensor of size 5]
``````
``````b = a.numpy()
print(b)
print(type(b))
``````

``````[ 1.  1.  1.  1.  1.]
<class 'numpy.ndarray'>
``````

`````` 2
2
2
2
2
[torch.FloatTensor of size 5]

[ 2.  2.  2.  2.  2.]
``````

把numpy数组转换为torch张量

``````[ 2.  2.  2.  2.  2.]

2
2
2
2
2
[torch.DoubleTensor of size 5]
``````

CUDA张量

``````# let us run this cell only if CUDA is available
if torch.cuda.is_available():
x = x.cuda()
y = y.cuda()
x + y
``````

Python源码

Jupyter源码

PyTorch 中所有神经网络的核心是`autograd`包.我们首先简单介绍一下这个包,然后训练我们的第一个神经网络.

`autograd`包为张量上的所有操作提供了自动求导.它是一个运行时定义的框架,这意味着反向传播是根据你的代码如何运行来定义,并且每次迭代可以不同.

变量(Variable)

`autograd.Variable``autograd`包的核心类.它包装了张量(`Tensor`),支持几乎所有的张量上的操作.一旦你完成你的前向计算,可以通过`.backward()`方法来自动计算所有的梯度.

Variable.png

``````import torch
``````

``````x = Variable(torch.ones(2, 2), requires_grad=True)
print(x)
``````

``````Variable containing:
1  1
1  1
[torch.FloatTensor of size 2x2]
``````

``````y = x + 2
print(y)
``````

``````Variable containing:
3  3
3  3
[torch.FloatTensor of size 2x2]
``````

``````print(y.grad_fn)
``````

``````<torch.autograd.function.AddConstantBackward object at 0x7faa6f3bdd68>
None
``````

``````z = y * y * 3
out = z.mean()

print(z, out)
``````

``````Variable containing:
27  27
27  27
[torch.FloatTensor of size 2x2]
Variable containing:
27
[torch.FloatTensor of size 1]
``````

``````out.backward()
``````

``````print(x.grad)
``````

``````Variable containing:
4.5000  4.5000
4.5000  4.5000
[torch.FloatTensor of size 2x2]
``````

``````x = torch.randn(3)

y = x * 2
while y.data.norm() < 1000:
y = y * 2

print(y)
``````

``````Variable containing:
682.4722
-598.8342
692.9528
[torch.FloatTensor of size 3]
``````
``````gradients = torch.FloatTensor([0.1, 1.0, 0.0001])

``````

``````Variable containing:
102.4000
1024.0000
0.1024
[torch.FloatTensor of size 3]
``````

Python源码

Jupyter源码

三、神经网络

mnist.png

<p align='center'>convnet</p>

1. 定义神经网络模型,它有一些可学习的参数(或者权重);

2. 在数据集上迭代;

3. 通过神经网络处理输入;

4. 计算损失(输出结果和正确值的差距大小)

5. 将梯度反向传播会网络的参数;

6. 更新网络的参数,主要使用如下简单的更新原则:

``````weight = weight - learning_rate * gradient
``````

定义网络

``````import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5*5 square convolution
# kernel

self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
# max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
``````

``````Net (
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear (400 -> 120)
(fc2): Linear (120 -> 84)
(fc3): Linear (84 -> 10)
)
``````

`net.parameters()`返回模型需要学习的参数

``````params = list(net.parameters())
print(len(params))
for param in params:
print(param.size())
``````

``````10
torch.Size([6, 1, 5, 5])
torch.Size([6])
torch.Size([16, 6, 5, 5])
torch.Size([16])
torch.Size([120, 400])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])
``````

`forward`的输入和输出都是`autograd.Variable`.注意:这个网络(LeNet)期望的输入大小是32*32.如果使用MNIST数据集来训练这个网络,请把图片大小重新调整到32*32.

``````input = Variable(torch.randn(1, 1, 32, 32))
out = net(input)
print(out)
``````

``````Variable containing:
-0.0536 -0.0548 -0.1079  0.0030  0.0521 -0.1061 -0.1456 -0.0095  0.0704  0.0259
[torch.FloatTensor of size 1x10]
``````

``````net.zero_grad()
out.backward(torch.randn(1, 10))
``````

1. `torch.nn` 只支持小批量输入,整个`torch.nn`包都只支持小批量样本,而不支持单个样本
2. 例如,`nn.Conv2d`将接受一个4维的张量,每一维分别是sSamples * nChannels * Height * Width(样本数*通道数*高*宽).
3. 如果你有单个样本,只需使用`input.unsqueeze(0)`来添加其它的维数.

• `torch.Tensor`-一个多维数组
• `autograd.Variable`-包装一个`Tensor`,记录在其上执行过的操作.除了拥有`Tensor`拥有的API,还有类似`backward()`的API.也保存关于这个向量的梯度.
• `nn.Module`-神经网络模块.封装参数,移动到GPU上运行,导出,加载等
• `nn.Parameter`-一种变量,当把它赋值给一个`Module`时,被自动的注册为一个参数.
• `autograd.Function`-实现一个自动求导操作的前向和反向定义,每个变量操作至少创建一个函数节点,(Every `Variable` operation, creates at least a single `Function` node, that connects to functions that created a `Variable` and encodes its history.)

• 定义一个神经网络
• 处理输入和调用`backward`

• 计算损失值
• 更新神经网络的权值

损失函数

`nn`包中有几种不同的损失函数.一个简单的损失函数是:`nn.MSELoss`,他计算输入(个人认为是网络的输出)和目标值之间的均方误差.

``````out = net(input)
target = Variable(torch.arange(1, 11))  # a dummy target, for example
criterion = nn.MSELoss()

loss = criterion(out, target)
print(loss)
``````

``````Variable containing:
38.1365
[torch.FloatTensor of size 1]
``````

``````input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss
``````

``````print(loss.grad_fn)  # MSELoss
``````

``````<torch.autograd.function.MSELossBackward object at 0x7fb3c0dcf4f8>
``````

反向传播

``````conv1.bias.grad before backward
Variable containing:
0numpy
0
0
0
0
0
[torch.FloatTensor of size 6]

Variable containing:
-0.0317
-0.1682
-0.0158
0.2276
-0.0148
-0.0254
[torch.FloatTensor of size 6]
``````

``````conv1.bias.grad before backward
None
Variable containing:
0.0011
0.1170
-0.0012
-0.0204
-0.0325
-0.0648
[torch.FloatTensor of size 6]
``````

• 更新网络的权重

更新权重

\$\$weight = weight - learning_rate * gradient\$\$

``````learning_rate = 0.01
for f in net.parameters():
``````

``````import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01)

output = net(input)
loss = criter(output, target)
loss.backward()
optimizer.setp() # does the update
``````

Python源码

Jupyter源码

四、训练一个分类器

关于数据

• 对于图像,有诸如Pillow,OpenCV包.
• 对于音频,有诸如scipy和librosa包
• 对于文本,原始Python和Cython来加载,或者NLTK和SpaCy是有用的.

cifar10.png

训练一个图像分类器

1. 使用`torchvision`加载和归一化CIFAR10训练集和测试集.
2. 定义一个卷积神经网络
3. 定义损失函数
4. 在训练集上训练网络
5. 在测试机上测试网络

1. 加载和归一化CIFAR0

``````import torch
import torchvision
import torchvision.transforms as transforms
``````

torchvision的输出是[0,1]的PILImage图像,我们把它转换为归一化范围为[-1, 1]的张量.

``````transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
``````

``````Files already downloaded and verified
``````

``````import matplotlib.pyplot as plt
import numpy as np

# functions to show an image

def imshow(img):
img = img / 2 + 0.5     # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()

# get some random training images
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
``````

``````truck   cat   car plane
``````
sphx_glr_cifar10_tutorial_001.png

2. 定义一个卷积神经网络

``````from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

net = Net()
``````

3. 定义损失函数和优化器

``````import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
``````

4. 训练网络

``````for epoch in range(2):  # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data

# wrap them in Variable
inputs, labels = Variable(inputs), Variable(labels)

# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.data[0]
if i % 2000 == 1999:    # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0

print('Finished Training')
``````

``````[1,  2000] loss: 2.191
[1,  4000] loss: 1.866
[1,  6000] loss: 1.696
[1,  8000] loss: 1.596
[1, 10000] loss: 1.502
[1, 12000] loss: 1.496
[2,  2000] loss: 1.422
[2,  4000] loss: 1.370
[2,  6000] loss: 1.359
[2,  8000] loss: 1.321
[2, 10000] loss: 1.311
[2, 12000] loss: 1.275
Finished Training
``````

5. 在测试集上测试网络

``````dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
``````
sphx_glr_cifar10_tutorial_002.png

``````GroundTruth:    cat  ship  ship plane
``````

``````outputs = net(Variable(images))
``````

``````_, predicted = torch.max(outputs.data, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))
``````

``````Predicted:    cat  ship   car plane
``````

``````correct = 0
total = 0
images, labels = data
outputs = net(Variable(images))
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
``````

``````Accuracy of the network on the 10000 test images: 55 %
``````

``````class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
images, labels = data
outputs = net(Variable(images))
_, predicted = torch.max(outputs.data, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i]
class_total[label] += 1

for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
``````

``````Accuracy of plane : 60 %
Accuracy of   car : 46 %
Accuracy of  bird : 44 %
Accuracy of   cat : 35 %
Accuracy of  deer : 38 %
Accuracy of   dog : 43 %
Accuracy of  frog : 57 %
Accuracy of horse : 76 %
Accuracy of  ship : 71 %
Accuracy of truck : 74 %
``````

在GPU上训练

``````net.cuda()
``````

``````inputs, labels = Variable(inputs.cuda()), Variable(target.cuda())
``````

• 深入了解了PyTorch的张量库和神经网络.
• 训练了一个小网络来分类图片.

Python源码

Jupyter源码

五、数据并行(选读)

PyTorch非常容易的就可以使用GPU,你可以用如下方式把一个模型防盗GPU上:

``````model.gpu()
``````

``````mytensor = mytensor.gpu()
``````

``````model = nn.DataParallel(model)
``````

导入和参数

``````import torch
import torch.nn as nn

input_size = 5
output_size = 2

batch_size = 30
data_size = 100
``````

虚拟数据集

``````class RandomDataset(Dataset):

def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)

def __getitem__(self, index):
return self.data[index]

def __len__(self):
return self.len

batch_size=batch_size, shuffle=True)
``````

简单模型

``````class Model(nn.Module):
# Our model

def __init__(self, input_size, output_size):
super(Model, self).__init__()
self.fc = nn.Linear(input_size, output_size)

def forward(self, input):
output = self.fc(input)
print("  In Model: input size", input.size(),
"output size", output.size())

return output
``````

运行模型

``````for data in rand_loader:
if torch.cuda.is_available():
input_var = Variable(data.cuda())
else:
input_var = Variable(data)

output = model(input_var)
print("Outside: input size", input_var.size(),
"output_size", output.size())
``````

``````In Model: input size torch.Size([30, 5]) output size torch.Size([30, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([30, 5]) output size torch.Size([30, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([30, 5]) output size torch.Size([30, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
``````

结果

2个GPU

``````# on 2 GPUs
Let's use 2 GPUs!
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
``````

3个GPU

``````Let's use 3 GPUs!
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
``````

8个GPU

``````Let's use 8 GPUs!
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
``````

总结

DataParallel自动的划分数据，并将作业发送到多个GPU上的多个模型。在每个模型完成作业后，DataParallel收集并合并结果返回给你。

http://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

Python源码

Jupyter源码