
文章代码来源:《deep learning on keras》,非常好的一本书,大家如果英语好,推荐直接阅读该书,如果时间不够,可以看看此系列文章,文章为我自己翻译的内容加上自己的一些思考,水平有限,多有不足,请多指正,翻译版权所有,若有转载,请先联系本人。

四、keras的function API
五、keras callbacks使用


  • 可视化卷积网络中间输出(中间激活)这对于理解卷积网络层如何变换它们的输入,以及了解每一个卷积网络滤波器的意义。
  • 可视化卷积网络滤波器。这对于正确理解每个滤波器的视觉图案和在卷积网络中滤波器接收到的内容。
  • 可视化每一幅图像的分类激活值的热力图。这对于理解图像的哪一个部分对于分类起的作用最大,这也允许局部化图像中的物体。

在第一种模式——激活值可视化——我们将会使用我们从零训练的小的卷积网络(cat vs. dog)分类问题。在接下来的两种方法,我们将会使用VGG16模型。



>>> from keras.models import load_model
>>> model = load_model('cats_and_dogs_small_2.h5')
>>> model.summary() # As a reminder.
Layer (type) Output Shape Param #
conv2d_5 (Conv2D) (None, 148, 148, 32) 896
maxpooling2d_5 (MaxPooling2D) (None, 74, 74, 32) 0
conv2d_6 (Conv2D) (None, 72, 72, 64) 18496
maxpooling2d_6 (MaxPooling2D) (None, 36, 36, 64) 0
conv2d_7 (Conv2D) (None, 34, 34, 128) 73856
maxpooling2d_7 (MaxPooling2D) (None, 17, 17, 128) 0
conv2d_8 (Conv2D) (None, 15, 15, 128) 147584
maxpooling2d_8 (MaxPooling2D) (None, 7, 7, 128) 0
flatten_2 (Flatten) (None, 6272) 0
dropout_1 (Dropout) (None, 6272) 0
dense_3 (Dense) (None, 512) 3211776
dense_4 (Dense) (None, 1) 513
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0


img_path = '/Users/fchollet/Downloads/cats_and_dogs_small/test/cats/cat.1700.jpg'
# We preprocess the image into a 4D tensor
from keras.preprocessing import image
import numpy as np
img = image.load_img(img_path, target_size=(150, 150))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
# Remember that the model was trained on inputs
# that were preprocessed in the following way:
img_tensor /= 255.
# Its shape is (1, 150, 150, 3)


import matplotlib.pyplot as plt
Our test cat picture

为了提取我们想要看到的特征,我们将会建立一个keras模型以图像批作为输入,输出所有卷积和池化层的激活值。我们将会使用 Keras class模型来做到这一点。一个模型的实例用了两个参数:一个输入张量,一个输出张量。结果的类别是一个keras模型,就和你熟知的sequential模型类似的,将特定输入映射到特定输出。让这二者有区别的是我们现在要用的模型可以有多个输出,不像sequential。想要了解更多有关Model class的信息,可以看书的第七章第一部分。

from keras import models
# Extracts the outputs of the top 8 layers:
layer_outputs = [layer.output for layer in model.layers[:8]]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)


# This will return a list of 5 Numpy arrays:
# one array per layer activation
activations = activation_model.predict(img_tensor)


>>> first_layer_activation = activations[0]
>>> print(first_layer_activation.shape)
(1, 148, 148, 32)

这是一个148\times 148的特征,有着32个通道。让我们看一下第四个通道:

import matplotlib.pyplot as plt
plt.matshow(first_layer_activation[0, :, :, 4], cmap='viridis')
4th channel of the activation of the first layer on our test cat picture


plt.matshow(first_layer_activation[0, :, :, 7], cmap='viridis')
7th of the activation of the first layer on our test cat picture


# These are the names of the layers, so can have them as part of our plot
layer_names = []
for layer in model.layers[:8]:
images_per_row = 16
# Now let's display our feature maps
for layer_name, layer_activation in zip(layer_names, activations):
 # This is the number of features in the feature map
 n_features = layer_activation.shape[-1]
 # The feature map has shape (1, size, size, n_features)
 size = layer_activation.shape[1]
 # We will tile the activation channels in this matrix
 n_cols = n_features // images_per_row
 display_grid = np.zeros((size * n_cols, images_per_row * size))
# We'll tile each filter into this big horizontal grid
 for col in range(n_cols):
 for row in range(images_per_row):
 channel_image = layer_activation[0,
 :, :,
 col * images_per_row + row]
 # Post-process the feature to make it visually palatable
 channel_image -= channel_image.mean()
 channel_image /= channel_image.std()
 channel_image *= 64
 channel_image += 128
 channel_image = np.clip(channel_image, 0, 255).astype('uint8')
 display_grid[col * size : (col + 1) * size,
 row * size : (row + 1) * size] = channel_image
 # Display the grid
 scale = 1. / size
 plt.figure(figsize=(scale * display_grid.shape[1],
 scale * display_grid.shape[0]))
 plt.imshow(display_grid, aspect='auto', cmap='viridis')
Every channel of every layer activation on our test cat picture


  • 第一层表现得像各种边缘探测器的集合,在那个阶段,激活值仍然是保留了几乎原始图像的所有信息。
  • 更高一层,激活值就变得进一步抽象,更少的视觉理解。它们开始编码更高一层的内容,诸如“猫耳”和“猫眼”。更高级的表示得到的图像的视觉内容更少了,并得到了更多关于图像类别的信息。
  • 随着层的加深,激活值的稀疏性也在增加:在第一层,所有的滤波器都被输入图像激活了,但在接下来的层里面越来越多的滤波器变空了。这意味着滤波器编码的图案不是在输入图像中找到的。


Left: attempts to draw a bicycle from memory. Right: what a schematic bicycle should look like.


另一个简单的事情是监视滤波器从卷积网络里学到的东西,并把它用可视化的方式展现出来。这能通过输入空间里的gradient ascent做到:使用gradient descent来评估输入图像的卷积网络以最大化特定滤波器的反馈,从一个空白输入图像开始。最终输入图像的结果是对于选择的滤波器具有最大响应的。

from keras.applications import VGG16
from keras import backend as K
model = VGG16(weights='imagenet',
layer_name = 'block3_conv1'
filter_index = 0
layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])


# The call to `gradients` returns a list of tensors (of size 1 in this case)
# hence we only keep the first element -- which is a tensor.
grads = K.gradients(loss, model.input)[0]


# We add 1e-5 before dividing so as to avoid accidentally dividing by 0.
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)

现在我们需要一个方法来在给定输入图像时,计算损失张量的值以及梯度张量。我们能够定义一个keras 的backend函数来做到:iterate是一个函数,拿进去一个数组张量返回两个数组张量:损失值和梯度值。

iterate = K.function([model.input], [loss, grads])
# Let's test it:
import numpy as np
loss_value, grads_value = iterate([np.zeros((1, 150, 150, 3))])


# We start from a gray image with some noise
input_img_data = np.random.random((1, 150, 150, 3)) * 20 + 128.
# Run gradient ascent for 40 steps
step = 1. # this is the magnitude of each gradient update
for i in range(40):
 # Compute the loss value and gradient value
 loss_value, grads_value = iterate([input_img_data])
 # Here we adjust the input image in the direction that maximizes the loss
 input_img_data += grads_value * step


def deprocess_image(x):
 # normalize tensor: center on 0., ensure std is 0.1
 x -= x.mean()
 x /= (x.std() + 1e-5)
 x *= 0.1
 # clip to [0, 1]
 x += 0.5
 x = np.clip(x, 0, 1)
 # convert to RGB array
 x *= 255
 x = np.clip(x, 0, 255).astype('uint8')
 return x


def generate_pattern(layer_name, filter_index, size=150):
 # Build a loss function that maximizes the activation
 # of the nth filter of the layer considered.
 layer_output = model.get_layer(layer_name).output
 loss = K.mean(layer_output[:, :, :, filter_index])
 # Compute the gradient of the input picture wrt this loss
 grads = K.gradients(loss, model.input)[0]
 # Normalization trick: we normalize the gradient
 grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)
 # This function returns the loss and grads given the input picture
 iterate = K.function([model.input], [loss, grads])
 # We start from a gray image with some noise
 input_img_data = np.random.random((1, size, size, 3)) * 20 + 128.
 # Run gradient ascent for 40 steps
 step = 1.
 for i in range(40):
 loss_value, grads_value = iterate([input_img_data])
 input_img_data += grads_value * step
 img = input_img_data[0]
 return deprocess_image(img)


>>> plt.imshow(generate_pattern('block3_conv1', 0))

Pattern that the 0th channel in layer block3_conv1 maximally responds to

8\times 8
64\times 64

layer_name = 'block1_conv1'
size = 64
margin = 5
# This a empty (black) image where we will store our results.
results = np.zeros((8 * size + 7 * margin, 8 * size + 7 * margin, 3))
for i in range(8): # iterate over the rows of our results grid
 for j in range(8): # iterate over the columns of our results grid
 # Generate the pattern for filter `i + (j * 8)` in `layer_name`
 filter_img = generate_pattern(layer_name, i + (j * 8), size=size)
 # Put the result in the square `(i, j)` of the results grid
 horizontal_start = i * size + i * margin
 horizontal_end = horizontal_start + size
 vertical_start = j * size + j * margin
 vertical_end = vertical_start + size
 results[horizontal_start: horizontal_end, vertical_start: vertical_end, :] = filter_img
# Display the results grid
plt.figure(figsize=(20, 20))
Filter patterns for layer block1_conv1

Filter patterns for layer block2_conv1

Filter patterns for layer block3_conv1

Filter patterns for layer block4_conv1


  • 第一层滤波器block1_conv1编码一些简单的边缘和颜色,或者某些情况下编码彩色边缘。
  • block2_conv1的滤波器将边缘和颜色组合起来编码简单的纹理
  • 在更高层的滤波器开始寻找在自然图像中类似的纹理:羽毛,眼睛,叶子等等。


这种一般的分类方法叫做“分类激活图”CAM可视化,是通过在输入图像上画分类激活值的热力图得到的。一个分类激活值的热力图是一个二维的和特定输出类别相关的分数,计算每一个输入图像的位置,指示着每一个位置对于分类结果的重要程度。例如,给一个图像进我们的"cat vs. dog"卷积网络,分类激活图允许我们生成一幅关于猫的热力图,指示猫样子的图在不同的地方是什么样子,类似的可以画出狗的。

from keras.applications.vgg16 import VGG16
# Note that we are including the densely-connected classifier on top;
# all previous times, we were discarding it.
model = VGG16(weights='imagenet')


Our test picture of African elephants

224 \times 224
224\times 224

from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np
# The local path to our target image
img_path = '/Users/fchollet/Downloads/creative_commons_elephant.jpg'
# `img` is a PIL image of size 224x224
img = image.load_img(img_path, target_size=(224, 224))
# `x` is a float32 Numpy array of shape (224, 224, 3)
x = image.img_to_array(img)
# We add a dimension to transform our array into a "batch"
# of size (1, 224, 224, 3)
x = np.expand_dims(x, axis=0)
# Finally we preprocess the batch
# (this does channel-wise color normalization)
x = preprocess_input(x)


>>> preds = model.predict(x)
>>> print('Predicted:', decode_predictions(preds, top=3)[0])
Predicted:', [(u'n02504458', u'African_elephant', 0.92546833),
(u'n01871265', u'tusker', 0.070257246),
(u'n02504013', u'Indian_elephant', 0.0042589349)]


  • 非洲象(92.5%)
  • 长牙象(7%)
  • 印度象(0.4%)

因此我们的网络识别图像时包含了一个待定数量的非洲象。预测向量 的最大激活值在“非洲象类”,指标是386.

>>> np.argmax(preds[0])


# This is the "african elephant" entry in the prediction vector
african_elephant_output = model.output[:, 386]
# The is the output feature map of the `block5_conv3` layer,
# the last convolutional layer in VGG16
last_conv_layer = model.get_layer('block5_conv3')
# This is the gradient of the "african elephant" class with regard to
# the output feature map of `block5_conv3`
grads = K.gradients(african_elephant_output, last_conv_layer.output)[0]
# This is a vector of shape (512,), where each entry
# is the mean intensity of the gradient over a specific feature map channel
pooled_grads = K.mean(grads, axis=(0, 1, 2))
# This function allows us to access the values of the quantities we just defined:
# `pooled_grads` and the output feature map of `block5_conv3`,
# given a sample image
iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])
# These are the values of these two quantities, as Numpy arrays,
# given our sample image of two elephants
pooled_grads_value, conv_layer_output_value = iterate([x])
# We multiply each channel in the feature map array
# by "how important this channel is" with regard to the elephant class
for i in range(512):
 conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
# The channel-wise mean of the resulting feature map
# is our heatmap of class activation
heatmap = np.mean(conv_layer_output_value, axis=-1)


heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
African elephant class activation heatmap over our test picture


import cv2
# We use cv2 to load the original image
img = cv2.imread(img_path)
# We resize the heatmap to have the same size as the original image
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
# We convert the heatmap to RGB
heatmap = np.uint8(255 * heatmap)
# We apply the heatmap to the original image
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
# 0.4 here is a heatmap intensity factor
superimposed_img = heatmap * 0.4 + img
# Save the image to disk
cv2.imwrite('/Users/fchollet/Downloads/elephant_cam.jpg', superimposed_img)
Superimposing the class activation heatmap with the original picture


  • 为什么网络认为图像包含了一个非洲象?
  • 非洲象在图像中的位置?




  • 卷积网络是一个解决图像分类问题的最好工具。
  • 它们通过学习模式化的层次以及在视觉世界的表示来工作
  • 它们学习到的表示很容易观察到——它们和黑盒相反


  • 你能够从零训练你的卷积网络解决图像分类问题。
  • 你理解如何使用可视化数据增加来对抗过拟合
  • 你知道如何使用预训练的卷积网络来做特征提取和调参
  • 你能够生成你的网络学习到的滤波器的可视化,就像分类激活的热力图那样。
