This method generates a synthetic image that maximally activates a neuron. We use a test image to visualize what part of the filters in a given convolution layer gets activated during the forward pass. Then, we backpropagate to compute gradients of the neuron values in the filters with respect to image pixels and update the image with these gradients. For better interpretation, we penalize these gradients with L2 norm and apply some more regularization techniques. Now, we subtract the original input image from updated one to visualize the activated part of the filters.
usage:
viz_gradient_ascent.py [--iterations ITERATIONS] [--img IMG] [--weights_path WEIGHTS_PATH] [--layer LAYER] [--num_filters NUM_FILTERS] [--size SIZE]
Arguments:
--iterations INT - Number of gradient ascent iterations
--img STRING - Path of the input image
--weights_path STRING - Path of the saved pre-trained model
--layer STRING - Name of layer to use
--num_filters INT - Number of filters to vizualize
--size INT - Image size
Suppose the test image is of a bird:
After one forward pass, if we visualize the filters in first layer (Conv1_1), we can clearly see the bird-like shape in some filters. These shapes corresponds to activated neurons in the filters that further helps the CNN model to recognize objects in the image.
Layer Conv1_1 (All 64 filters)
Similarly, we visualize the filters in the second layer (Conv1_2) in which the activation maps are noisy but there are still a few filters has bird-like shape of activation map.
Layer Conv1_2 (All 64 filters)
Finally, when we visualize the last convolution layer as it actually determines the output of the model. We select 16 random filters (from 512) out of which there is a filter that convincingly recognizes the shape of the bird with some more details in it.
Layer Conv5_3 (Randomly chosen 16 filters)
It finds the part of an input image that an output neuron responds to. It iterates a blank window that occludes various parts of the image and monitors the output of the classifier model. This representation helps us to localize the objects withing the image due to the fact that when a significant portion of the object is occluded, the probability of the correct class drops.
usage:
viz_occlusion.py [--img IMG] [--weights_path WEIGHTS_PATH] [--size SIZE] [--occ_size OCC_SIZE] [--pixel PIXEL] [--stride STRIDE] [--norm NORM] [--percentile PERCENTILE]
Arguments:
--img STRING - Path of the input image
--weights_path STRING - Path of the saved pre-trained model
--size INT - Image size
--occ_size INT - Size of occluding window
--pixel INT - Occluding window - pixel values
--stride INT - Occlusion Stride
--norm INT - Normalize probabilities first
--percentile INT - Regularization percentile for heatmap
For the given test image:
The pre-trained CNN model predicted Class: School Bus
with the highest probability of 0.86515594
.
The following figure shows visualization of probabilities output for School bus
class as a function of occluder position:
To clearly localize the object, we regularize the above heatmap to extract the strongest features.
The below images show before and after projection of regularized heat-map on the input image. It proves that the above visualization genuinely corresponds to the object structure that stimulates these features.
- Visualizing what ConvNets learn - Andrej Karpathy. (link)
- Convolutional Neural Networks for Visual Recognition - CS231n. (lecture)
- How convolutional neural networks see the world - Francois Chollet. (link, github)
- Visualizing and Understanding Convolutional Networks - Matthew D Zeiler and Rob Fergus. (paper)
- Occlusion experiments - DaoYu Lin. (github)