神经网络压缩实验-Deep-compression

首发于个人博客,结合论文阅读笔记更佳

实验准备

基础网络搭建

为了实现神经网络的deep compression,首先要训练一个深度神经网络,为了方便实现,这里实现一个两层卷积+两层MLP的神经网络

class net(pt.nn.Module):
    
    def __init__(self):
        super(net,self).__init__()
        self.conv1 = pt.nn.Conv2d(in_channels=1,out_channels=64,kernel_size=3,padding=1)
        self.conv2 = pt.nn.Conv2d(in_channels=64,out_channels=256,kernel_size=3,padding=1)
        self.fc1 = pt.nn.Linear(in_features=7*7*256,out_features=512)
        self.fc2 = pt.nn.Linear(in_features=512,out_features=10)
        self.pool = pt.nn.MaxPool2d(2)
        
    def forward(self,x):
        x = self.pool(pt.nn.functional.relu(self.conv1(x)))
        x = self.pool(pt.nn.functional.relu(self.conv2(x)))
        x = pt.nn.functional.relu(self.fc1(x.view((-1,7*7*256))))
        return self.fc2(x)
model = net().cuda()
print(model)
print(model(pt.rand(1,1,28,28).cuda()))
net(
  (conv1): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=12544, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=10, bias=True)
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
tensor(1.00000e-02 *
       [[-7.7157,  3.0435, -6.5732,  6.5343, -4.2159, -2.8651, -0.6792,
          3.9223, -3.7523,  2.4532]], device='cuda:0')

基础网络训练

准备数据集

train_dataset = ptv.datasets.MNIST("./",download=True,transform=ptv.transforms.ToTensor())
test_dataset = ptv.datasets.MNIST("./",train=False,transform=ptv.transforms.ToTensor())
trainloader = pt.utils.data.DataLoader(train_dataset,shuffle=True,batch_size=128)
testloader = pt.utils.data.DataLoader(test_dataset,shuffle=True,batch_size=128)

代价函数与优化器

lossfunc = pt.nn.CrossEntropyLoss().cuda()
optimizer = pt.optim.Adam(model.parameters(),1e-4)
def acc(outputs,label):
    _,data = pt.max(outputs,dim=1)
    return pt.mean((data.float()==label.float()).float()).item()

网络训练

for _ in range(1):
    for i,(data,label) in enumerate(trainloader):
        data,label = data.cuda(),label.cuda()
        model.zero_grad()
        outputs = model(data)
        loss = lossfunc(outputs,label)
        loss.backward()
        optimizer.step()
        if i % 100 == 0:
            print(i,acc(outputs,label))
0 0.1171875
100 0.8984375
200 0.953125
300 0.984375
400 0.96875

测试网络

def test_model(model,testloader):
    result = []
    for data,label in testloader:
        data,label = data.cuda(),label.cuda()
        outputs = model(data)
        result.append(acc(outputs,label))
    result = sum(result) / len(result)
    print(result)
    return result
test_model(model,testloader)
0.96875

保存网络

pt.save(model.state_dict(),"./base.ptb")

剪枝实验

剪枝是deep compression的第一步,含义是将部分较小(小于某个阈值)的权值置位为0,表示这个连接被剪掉,且在之后的微调过程中,这个连接的梯度也将被置位为0,即不参加训练

准备相关工具

剪枝实验需要准备一些函数:剪枝函数,梯度剪枝函数和稀疏度评估函数

剪枝函数

剪枝函数输入模型和阈值,将所有绝对值小于阈值的权值置位为0

def puring(model,threshold):
    for i in model.parameters():
        i.data[pt.abs(i) < threshold] = 0
    return model

梯度剪枝函数

def grad_puring(model):
    for i in model.parameters():
        mask = i.clone()
        mask[mask != 0] = 1
        i.grad.data.mul_(mask)

稀疏度评估函数

def print_sparse(model):
    result = []
    total_num = 0
    total_sparse = 0
    print("-----------------------------------")
    print("Layer sparse")
    for name,f in model.named_parameters():
        num = f.view(-1).shape[0]
        total_num += num
        sparse = pt.nonzero(f).shape[0]
        total_sparse+= sparse
        print("\t",name,(sparse)/num)
        result.append((sparse)/num)
    total = total_sparse/total_num
    print("Total:",total)
    return total

剪枝

首先,查看原有网络的稀疏度情况

model = net().cuda()
model.load_state_dict(pt.load("./base.ptb"))
_ = test_model(model,testloader)
0.96875
print_sparse(model)
-----------------------------------
Layer sparse
     conv1.weight 1.0
     conv1.bias 1.0
     conv2.weight 1.0
     conv2.bias 1.0
     fc1.weight 1.0
     fc1.bias 1.0
     fc2.weight 1.0
     fc2.bias 1.0
Total: 1.0

可以发现,原有网络完全没有稀疏性,现在进行剪枝,使用阈值为0.01进行剪枝,小于0.01的连接将被剪掉。根据结果可以发现,在阈值0.01下,剪枝后仅剩8.3%参数,且准确率不受影响

model1 = puring(model,0.01)
test_model(model1,testloader)
print_sparse(model1)
0.9706289556962026
-----------------------------------
Layer sparse
     conv1.weight 0.9739583333333334
     conv1.bias 0.90625
     conv2.weight 0.7641262478298612
     conv2.bias 0.71875
     fc1.weight 0.06729390669842156
     fc1.bias 0.025390625
     fc2.weight 0.7837890625
     fc2.bias 0.9
Total: 0.08358673475128647

0.08358673475128647

现在调整阈值为0.1,准确率大幅度下降,现在仅剩很少的参数

model.load_state_dict(pt.load("./base.ptb"))
model2 = puring(model,0.1)
test_model(model2,testloader)
print_sparse(model2)
0.09760680379746836
-----------------------------------
Layer sparse
     conv1.weight 0.671875
     conv1.bias 0.6875
     conv2.weight 0.0
     conv2.bias 0.0
     fc1.weight 0.0
     fc1.bias 0.0
     fc2.weight 0.0
     fc2.bias 0.0
Total: 6.553616029871108e-05

6.553616029871108e-05

现在进行阈值的格点扫描,扫描的范围从0.1到0.01,步长为0.01

sparse_list = []
threshold_list = [x*0.01+0.01 for x in range(10)]
acc_list = []
for i in threshold_list:
    model.load_state_dict(pt.load("./base.ptb"))
    model3 = puring(model,i)
    acc_list.append(test_model(model3,testloader))
    sparse_list.append(print_sparse(model3))
    threshold_list.append
0.9706289556962026
-----------------------------------
Layer sparse
     conv1.weight 0.9739583333333334
     conv1.bias 0.90625
     conv2.weight 0.7641262478298612
     conv2.bias 0.71875
     fc1.weight 0.06729390669842156
     fc1.bias 0.025390625
     fc2.weight 0.7837890625
     fc2.bias 0.9
Total: 0.08358673475128647
0.47735363924050633
-----------------------------------
Layer sparse
     conv1.weight 0.9375
     conv1.bias 0.890625
     conv2.weight 0.5333726671006944
     conv2.bias 0.4765625
     fc1.weight 0.0015011222995057398
     fc1.bias 0.0
     fc2.weight 0.5765625
     fc2.bias 0.7
Total: 0.01398429139292775
0.09513449367088607
-----------------------------------
Layer sparse
     conv1.weight 0.9045138888888888
     conv1.bias 0.890625
     conv2.weight 0.3156263563368056
     conv2.bias 0.2578125
     fc1.weight 1.5414490991709182e-05
     fc1.bias 0.0
     fc2.weight 0.371875
     fc2.bias 0.4
Total: 0.007479941525322959
0.09612341772151899
-----------------------------------
Layer sparse
     conv1.weight 0.8732638888888888
     conv1.bias 0.875
     conv2.weight 0.13545735677083334
     conv2.bias 0.0546875
     fc1.weight 0.0
     fc1.bias 0.0
     fc2.weight 0.1615234375
     fc2.bias 0.1
Total: 0.003250198205069488
0.09691455696202532
-----------------------------------
Layer sparse
     conv1.weight 0.8402777777777778
     conv1.bias 0.84375
     conv2.weight 0.03839111328125
     conv2.bias 0.00390625
     fc1.weight 0.0
     fc1.bias 0.0
     fc2.weight 0.016796875
     fc2.bias 0.0
Total: 0.0009558243703890901
0.1003757911392405
-----------------------------------
Layer sparse
     conv1.weight 0.8142361111111112
     conv1.bias 0.796875
     conv2.weight 0.0084228515625
     conv2.bias 0.0
     fc1.weight 0.0
     fc1.bias 0.0
     fc2.weight 0.0
     fc2.bias 0.0
Total: 0.00026792277133719006
0.09760680379746836
-----------------------------------
Layer sparse
     conv1.weight 0.7760416666666666
     conv1.bias 0.765625
     conv2.weight 0.0014580620659722222
     conv2.bias 0.0
     fc1.weight 0.0
     fc1.bias 0.0
     fc2.weight 0.0
     fc2.bias 0.0
Total: 0.00010811185608441666
0.09760680379746836
-----------------------------------
Layer sparse
     conv1.weight 0.7447916666666666
     conv1.bias 0.734375
     conv2.weight 0.00014241536458333334
     conv2.bias 0.0
     fc1.weight 0.0
     fc1.bias 0.0
     fc2.weight 0.0
     fc2.bias 0.0
Total: 7.55718600196274e-05
0.09968354430379747
-----------------------------------
Layer sparse
     conv1.weight 0.7065972222222222
     conv1.bias 0.71875
     conv2.weight 0.0
     conv2.bias 0.0
     fc1.weight 0.0
     fc1.bias 0.0
     fc2.weight 0.0
     fc2.bias 0.0
Total: 6.888139353901653e-05
0.09760680379746836
-----------------------------------
Layer sparse
     conv1.weight 0.671875
     conv1.bias 0.6875
     conv2.weight 0.0
     conv2.bias 0.0
     fc1.weight 0.0
     fc1.bias 0.0
     fc2.weight 0.0
     fc2.bias 0.0
Total: 6.553616029871108e-05
import matplotlib.pyplot as plt
plt.figure(figsize=(10,3))
plt.subplot(131)
plt.plot(threshold_list,acc_list)
plt.subplot(132)
plt.plot(threshold_list,acc_list)
plt.subplot(133)
plt.plot(sparse_list,acc_list)
plt.show()
output_30_0.png

上图自左向右分别是阈值-准确率,阈值-稀疏度和稀疏度-准确率关系

剪枝后微调

我们发现,阈值为大约0.02时,准确率仅为47%左右,考虑使用微调阈值的方式进行调整

model = net().cuda()
model.load_state_dict(pt.load("./base.ptb"))
model1 = puring(model,0.02)
test_model(model1,testloader)
print_sparse(model1)
0.4759691455696203
-----------------------------------
Layer sparse
     conv1.weight 0.9375
     conv1.bias 0.890625
     conv2.weight 0.5333726671006944
     conv2.bias 0.4765625
     fc1.weight 0.0015011222995057398
     fc1.bias 0.0
     fc2.weight 0.5765625
     fc2.bias 0.7
Total: 0.01398429139292775
optimizer = pt.optim.Adam(model1.parameters(),1e-5)
lossfunc = pt.nn.CrossEntropyLoss().cuda()
for _ in range(4):
    for i,(data,label) in enumerate(trainloader):
        data,label = data.cuda(),label.cuda()
        outputs = model1(data)
        loss = lossfunc(outputs,label)
        loss.backward()
        grad_puring(model1)
        optimizer.step()
        if i % 100 == 0:
            print(i,acc(outputs,label))

0 0.4375
100 0.4375
200 0.5625
300 0.6015625
400 0.6875
0 0.7265625
100 0.6953125
200 0.7890625
300 0.8046875
400 0.7734375
0 0.8125
100 0.8046875
200 0.890625
300 0.8515625
400 0.875
0 0.859375
100 0.8515625
200 0.9140625
300 0.890625
400 0.9296875
test_model(model1,testloader)
print_sparse(model1)
pt.save(model1.state_dict(),'./puring.pt')
0.9367088607594937
-----------------------------------
Layer sparse
     conv1.weight 0.9375
     conv1.bias 0.890625
     conv2.weight 0.5333726671006944
     conv2.bias 0.4765625
     fc1.weight 0.0015011222995057398
     fc1.bias 0.0
     fc2.weight 0.5765625
     fc2.bias 0.7
Total: 0.01398429139292775

由上发现,经过权值微调后,在保持原有的稀疏度的情况下将准确率提高到了90%以上

量化实验

量化过程比较复杂,分为量化和微调两个步骤,量化步骤使用sklearn的k-mean实现,微调使用pytorch本身实现

量化

model = net().cuda()
model.load_state_dict(pt.load("./puring.pt"))
test_model(model,testloader)
0.9367088607594937
from sklearn.cluster import KMeans
import numpy as np
kmean_list = []
bit = 2
for name,i in model.named_parameters():
    data = i.data.clone().view(-1).cpu().detach().numpy().reshape(-1)
    data = data[data != 0]
    if data.size < 2 ** bit:
        kmean_list.append(None)
        continue
    init = [x*(np.max(data)+np.min(data))/(2 ** bit) + np.min(data) for x in range(2 ** bit)]
    kmn = KMeans(2 ** bit,init=np.array(init).reshape(2 ** bit,1))
    kmn.fit(data.reshape((-1,1)))
    kmean_list.append(kmn)
    print(name,i.shape)
conv1.weight torch.Size([64, 1, 3, 3])
conv1.bias torch.Size([64])
conv2.weight torch.Size([256, 64, 3, 3])
conv2.bias torch.Size([256])
fc1.weight torch.Size([512, 12544])
fc2.weight torch.Size([10, 512])
fc2.bias torch.Size([10])


c:\program files\python35\lib\site-packages\sklearn\cluster\k_means_.py:896: RuntimeWarning: Explicit initial center position passed: performing only one init in k-means instead of n_init=10
  return_n_iter=True)

训练完量化器后,将每一层数据使用对应的量化器进行量化

for i,(name,f) in enumerate(model.named_parameters()):
    data = f.data.clone().view(-1).cpu().detach().numpy().reshape(-1)
    data_nozero = data[data != 0].reshape((-1,1))
    if data_nozero.size == 0 or data.size < 2 ** bit or kmean_list[i] is None:
        f.kmeans_result = None
        f.kmeans_label = None
        continue
#     print(name)
#     print(data.size)

    result = data.copy()
    result[result == 0] = -1
    
#     print(data_nozero)
#     print(kmean_list[i])
    label = kmean_list[i].predict(data_nozero).reshape(-1)
#     print(data_nozero)
#     print(label)
    new_data = np.array([kmean_list[i].cluster_centers_[x] for x in label])
    data[data != 0] = new_data.reshape(-1)
#     print(data,new_data)
    f.data = pt.from_numpy(data).view(f.data.shape).cuda()
    result[result != -1] = label
    f.kmeans_result = pt.from_numpy(result).view(f.data.shape).cuda()
    f.kmeans_label = pt.from_numpy(kmean_list[i].cluster_centers_).cuda()
test_model(model,testloader)
print_sparse(model)
0.8919106012658228
-----------------------------------
Layer sparse
     conv1.weight 0.9375
     conv1.bias 0.890625
     conv2.weight 0.5333726671006944
     conv2.bias 0.4765625
     fc1.weight 0.0015011222995057398
     fc1.bias 0.0
     fc2.weight 0.5765625
     fc2.bias 0.7
Total: 0.01398429139292775

0.01398429139292775

由上可以发现,对于这种玩具级的网络来说,2bit量化已经完全足够了,精度损失3个百分点

微调

lossfunc = pt.nn.CrossEntropyLoss().cuda()
lr = 0.001
for _ in range(1):
    for a,(data,label) in enumerate(trainloader):
        data,label = data.cuda(),label.cuda()
        model.zero_grad()
        outputs = model(data)
        loss = lossfunc(outputs,label)
        loss.backward()

        for name,i in model.named_parameters():
#             print(i.data)
#             break
            if i.kmeans_result is None:
                continue
            for x in range(2 ** bit):
                grad = pt.sum(i.grad.detach()[i.kmeans_result == x])
#                 print(grad.item())
                i.kmeans_label[x] += -lr * grad.item()
                i.data[i.kmeans_result == x] = i.kmeans_label[x].item()
#                 print(i.data)
#                 break
#             print(name)
#             test_model(model,testloader)
#             break
        if a % 100 == 0:
            print(a,acc(outputs,label))
#         break
#     break
0 0.8828125
100 0.921875
200 0.9296875
300 0.9296875
400 0.9140625
test_model(model,testloader)
print_sparse(model)
pt.save(model.state_dict(),"quantization.pt")
0.9384889240506329
-----------------------------------
Layer sparse
     conv1.weight 0.9375
     conv1.bias 0.890625
     conv2.weight 0.5333726671006944
     conv2.bias 0.4765625
     fc1.weight 0.0015011222995057398
     fc1.bias 0.0
     fc2.weight 0.5765625
     fc2.bias 0.7
Total: 0.01398429139292775

通过对量化中心的微调,2bit量化网络的准确率已经与非量化网络的准确率相当

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 159,015评论 4 362
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,262评论 1 292
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 108,727评论 0 243
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 43,986评论 0 205
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,363评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,610评论 1 219
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,871评论 2 312
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,582评论 0 198
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,297评论 1 242
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,551评论 2 246
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,053评论 1 260
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,385评论 2 253
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,035评论 3 236
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,079评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,841评论 0 195
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,648评论 2 274
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,550评论 2 270

推荐阅读更多精彩内容

  • 深度学习使得很多计算机视觉任务的性能达到了一个前所未有的高度。不过,复杂的模型固然具有更好的性能,但是高额的存储空...
    CodePlayHu阅读 40,163评论 8 55
  • 同步发布于个人博客 基本步骤 以上是Deep compression中所述的神经网络压缩方法,主要包括三个步骤: ...
    月见樽阅读 5,524评论 1 2
  • 改进神经网络的学习方法(下) 权重初始化 创建了神经网络后,我们需要进行权重和偏差的初始化。到现在,我们一直是根据...
    nightwish夜愿阅读 1,755评论 0 0
  • 我是展鹏教育的大邢老师,这是我加入日记星球的第54篇原创日记。 今天是周五,明天好几个班的新生开课,所以说今天是又...
    大邢老师阅读 319评论 0 3
  • 但是我需要还债,更想存钱。
    卓召阅读 225评论 0 0