The world’s leading publication for data science, AI, and ML professionals.

How Deep Neural Networks Look for Features in Images? With Keras and Google Colab

Extracting Conv Layer Output from Hidden Layers

What's in the net? (Source: Pixabay)
What’s in the net? (Source: Pixabay)

When I first started taking steps from learning standard machine algorithms like Logistic Regression, Support Vector Machine towards Deep Learning and neural network, I used to often fascinate that the deep layers in the networks are kind of ‘black boxes’. Later on this false understanding went away and, once I learnt to plot the intermediate convolutional layer output, it almost became an obsession to randomly select images and see what’s happening in each layer. Today, I would like to give a step by step description on how you can extract features from hidden conv. layers using Keras (running on top of TensorFlow). For simplicity, I took the dogs vs. cats data-set, and I will build a VGG16¹ like model so, the problem essentially boils down to a binary classification problem.

What you can expect to learn from this post –

  • Learn to use Google Colab to deploy your deep learning models. I found this is extremely useful as you can use cloud GPU for free, with 12.72 GB RAM and 350 GB disk space.
  • Extract hidden conv. layer outputs using Keras.
  • Two different ways to tile these outputs to form a compact image.

So without delay let’s get started!


Setting up the Google Colab Environment:

If you don’t have a GPU and large CPU resources, Google Colab can come to your rescue to train moderate to heavy deep networks. Currently Colab offers 12 GB Nvidia Tesla GPU and it can be used up to 12 hours continuously. Provided you are accustomed working with Jupyter environment, you can easily settle in with Google Colab. Check detailed tutorials on using Colab² provided by Google, here, I describe the two steps that are necessary to go through the tutorial.

  1. Using GPU: To access the GPU you need to change the run-time type. The pictures below shows the Colab environment.
  1. Mount Your Drive: You need to mount your google drive to access files from the drive. For that you need to run the commands below –
from google.colab import drive
drive.mount('/content/gdrive')

The URL will provide you the one time authorization code, copy and paste it in the box below and press enter. You will get the confirmation –

Mounted at /content/gdrive. 

After this, you are ready to use files and folders directly from your drive. Now let’s dive into the tutorial.


Train Deep Neural Net with Keras inside Colab

Let’s build our model using Keras Sequential

To finish training faster, I used a model which is more like a mini-version of VGG16 architecture (2 layers of conv. layers followed by a pooling layer), with input size set to (160, 160). Let’s check the model summary –

I guess most of you know to count parameters. But let’s just go through first few layers to review this. For the first layer, input image size is (160, 160) with 3 channels _(nc). The filter size (f) is (3, 3) and number of filters _(nf) are 16. So the total number of weights (f× f × n_f × n_c) = 432. Number of biases = n_f = 16. Total number of parameters = 448. Similarly for the second layer we have – weights = (3× 3 × 16 × 16) = 2304, biases = 16, so, total number of parameters = 2320 and so on…

Data Pre-Processing: Before using Keras ImageDataGenerator class, we have to remember that here we will use files and folders directly from google drive. So we have to be precise about the file paths. Let’s see the modified code blocks –

I used only 2800 images for training and 600 images for the validation to save time. Next steps inevitably are compiling and fitting the model –

I have used 100 epochs and with the parameter settings, accuracy of 89% and 83% on training and validation data were achieved. In Google Colab with GPU it takes around 75–80 minutes to train this model.

I tried predicting class labels on some random images downloaded from internet –

I see 3 images including one angry cat were predicted as dogs. Rather than focusing on increasing accuracy, our focus is to check the outputs from hidden conv. layers and see how different filters in a layer are trying to find different features in an image. Let’s do that


Visualizing Conv Layer Outputs:

I will describe two methods to visualize the conv. layer outputs, they are rather similar but, process of tiling the images are different. You can choose based on your preference…

1st Method: Stack the Layer Outputs Horizontally

Let’s check the layer names:

from keras.preprocessing.image import load img_to_array, load_img
import random 
layer_names_list = [layr.name for layr in model.layers]
print ("layer names list: ", layer_names_list)  
>>> layer names list: ['conv2d_1', 'block0_conv2', 'block0_pool1', 'block1_conv1', 'block1_conv2', 'block1_pool1', 'block2_conv1', 'block2_conv2', 'block2_pool1', 'block3_conv1', 'block3_pool', 'flatten_1', 'dense_2', 'Dropout_1', 'dense_3']

I will select few conv. layers from which I would like to see the output,

selected_layers = ['block0_conv2', 'block2_conv1', 'block2_conv2']
matched_indices = [i for i, item in enumerate(layer_names_list) if item in selected_layers]
print (matched_indices)
>>> [1, 6, 7]

To get outputs from the selected layers, we will use Keras layer.output method. Then append the outputs on a list, Let’s see:

selected_layers_outputs = []
for lr in range(len(matched_indices)):
   outputs = model.layers[matched_indices[lr]].output 
   #output from selected layers
   selected_layers_outputs.append(outputs)

Next step is important, as we will instantiate a new model, which will take a random image (of either cat or dog) as input and the outputs will be the selected conv. layer outputs. Check the Keras Model API for more details.

visual_model = keras.models.Model(inputs = model.input, outputs = selected_layers_outputs)

If you remember the input of our original model (VGG like), it was batches of images with input size (None, 160, 160, 3). We will select the same input size dimension but as we just want to process only 1 randomly selected image at a time, our batch size will be 1. First, let’s select an image randomly, we will do that using random.choice, which returns a random element from a non-empty sequence.

dog_files = [os.path.join(dog_train_dir, f) for f in dog_train_images]
cat_files = [os.path.join(cat_train_dir, g) for g in cat_train_images]
random_cat_dog = random.choice(dog_files + cat_files)
print ("random file name: ", random_cat_dog)

In the next step, we want to resize this image and convert this image to a numpy array and finally, reshape it to a consistent format (batch size, height, width, channel). Let’s do that using Keras load_img, Keras img_to_array and numpy modules.

rand_img = load_img(random_cat_dog, target_size=(160, 160))
rand_img_arr = img_to_array(rand_img)
print ("shape of selected image :", rand_img_arr.shape)
x_in = np.reshape(rand_img_arr, (1, 160, 160, 3)) # batch size 1
>>> shape of selected image : (160, 160, 3)

Once we have processed the image in a format that is suitable as an input for our model, let’s generate predictions from the model for the selected layers.

selected_feature_maps = visual_model.predict(x_in)

Now comes the part of arranging these predictions in such way, so that it is possible to visualize the effect of each filter on those selected layers. This part is little tricky and we need to unleash our playfulness with numpy. Let me give a brief outline of how we can proceed. If you look back to the model.summary() then you will get the overview of the shapes and the last element of the tuple is the number of filters and, first/second element of the tuple is the height/width of the image. First we create grids of zeroes with shape (height, height*number of filters), so that later we can stack the outputs horizontally. Next, we loop over the number of filters. We need to remember that the batch size is 1 so, to select a particular filter output from a selected layer we do this (check the detailed code later) –

for i in range(n_filters):
  y = feat_map [0, :, :, i]

Then we standardize and post process the output of the filters to make it visually recognizable. Finally, we stack the filter output in the display grid that we created before (grids of zeros). Using _matplotlib imshow_, we can visualize the effect of each filters on a particular layer where the images will be stacked side by side. As you can see in the image below. I found a fantastic detailed answer on how imshow method works; please check it to understand better on what happened on the last line of the second for loop in the code below.

for lr_name, feat_map in zip(selected_layers, selected_feature_maps):
  n_filters = feat_map.shape[-1]
  n_size = feat_map.shape[1]
  display_grid = np.zeros((n_size, n_size * n_filters))
  for i in range(n_filters):
    y = feat_map[0, :, :, i]
    y = y - y.mean()
    y = y/y.std()
    y = y*64
    y = y + 128
    y = np.clip(y, 0, 255).astype('uint8')# value only between 0, 255. 
    display_grid[:, i * n_size : (i+1) * n_size] = y
  scale = 20./n_filters
  plt.figure(figsize=(scale * n_filters * 1.4, scale * 2))
  plt.title(lr_name, fontsize=16)
  plt.grid(False)
plt.imshow(display_grid, aspect='auto', cmap='plasma')
plt.savefig('/content/gdrive/My Drive/Colab Notebooks/cat_dog_visual_%s.png'%(lr_name), dpi=300)
Figure 1: Stacking output from each filter horizontally from 3 different convolutional layers.
Figure 1: Stacking output from each filter horizontally from 3 different convolutional layers.

Here, we could see that the filters in the 2nd layer _(block0conv2) of the model where it sees the complete input (160, 160), mostly look for some basic edges. But, as we go deeper the input size reduces, for example in _block2conv2 layer the shape of the image is (40, 40) and, here the visual information are almost unrecognizable but features related to the class of the image are caught by the filters. Also you see the number of sparse filters increasing as we go deep in the network because, with increasing number of filters in each layer, the pattern encoded by the previous layer filters are not seen in the current layer. That’s why almost always you will see the in the first layer all filters are activated but from the second layer on sparsity increases.

I found the previous method of staking outputs horizontally reasonable but not visually compelling so, I give the second method which I found in Francois Chollet’s book Deep Learning with Python. This is very similar to the first one but instead of stacking output from all filters horizontally, we put them in an array. So the main concept here is to determine the shape of the array and stack the filter outputs as previously.

2nd Method: Here we take one benefit from the number of filters used in each layer and that is, they are all multiples of 16. So number of columns of each grid will be 16 and, the number of rows will depend on the number of filters used in the selected Convolutional layer. So, number of columns (ncols) will be given by = number of filters/16. Here, our grid of zeros will have shape (heightncols, 16 width). Consider that the height and width of the images are same in every layer.

images_per_row = 16
for lr_name1, feat_map1 in zip(selected_layers1, selected_feature_maps1):
  n_filters1 = feat_map1.shape[-1]
  n_size1 = feat_map1.shape[1]
  n_cols = n_filters1 // images_per_row
  display_grid1 = np.zeros((n_size1 * n_cols, images_per_row * n_size1))
  for col in range(n_cols):
    for row in range(images_per_row):
      chan_img = feat_map1[0, :, :, col*images_per_row + row]
      chan_img = chan_img - chan_img.mean()
      chan_img = chan_img / chan_img.std()
      chan_img = chan_img * 64
      chan_img = chan_img + 128
      chan_img = np.clip(chan_img, 0, 255).astype('uint8')
      display_grid1[col * n_size1 : (col+1) * n_size1, row * n_size1 : (row+1) * n_size1] = chan_img
  scale1 = 1./n_size1
  plt.figure(figsize=(scale1 * display_grid1.shape[1]*1.4, scale1 * display_grid1.shape[0] * 2.))
  plt.title(lr_name1)
  plt.grid(False)
  plt.imshow(display_grid1, aspect='auto', cmap='viridis')
  plt.savefig('/content/gdrive/My Drive/Colab Notebooks/cat_dog_visual2_%s.png'%(lr_name1), dpi=300)
Figure 2: Arranging the same images as in Figure 1 more cleanly so that they are easy to interpret and understand.
Figure 2: Arranging the same images as in Figure 1 more cleanly so that they are easy to interpret and understand.

With this representation you can clearly see how the filters in the deeper layers in our model concentrate on specific features of the cat, like the shape of the eyes, nose, eyebrow region, etc.


In this post, we have seen how one can use Google Colab to build and train your fairly large deep learning network. Our main focus was to visualize the journey of an image through several layers of a deep neural network and, we have learned two ways to do that. Also, rather than thinking of the deep layers as black boxes, visualization should help us to see through and understand them much better.


References:

[1] Very Deep Convolutional Networks for Large-Scale Image Recognition; K. Simonyan, A. Zisserman.

[2] Colab Tutorials by Google.

[3] Tensorflow Specialization Course: Deep Learning.ai

[4] Deep Learning with Python; Francois Chollet. pages: 160–177.

[5] Resource for Dealing with Files in Colab: Neptune.ai

[6] Link to the Notebook Used for this Post!


Related Articles