A step-by-step guide to develop explainable AI to classify surface crack images with Pytorch. This is the second part of a 2 part series.
In the last part (Part 1) you have seen how to build a CNN model using transfer learning and how to train the network. You saw that our network can perform well in classifying Crack images. But you may have questions – how is our network making decisions? Did our network look at the right location of an image to take its decision? We are not sure about that. Our network is a black box to us. We do not know what our convolutional neural network sees and understands when it makes a decision.
In this blog, we are going to produce visual explanation heat-maps that will help us to understand how our deep convolutional neural network algorithm makes it’s decision of having a crack or no-crack in an image.
In the previous part (Part 1), you did not see how to visualize the prediction results. Therefore, in this part I am going to do the following:
- Predict crack class on different images using our trained network (Inferencing)
- Use Grad-CAM [1] to generate visual explanation heatmap that can explain the decision of our network

1. Inferencing
We have already trained our network in Part 1. Now we only have to see what it looks like to predict an image with our trained network. To predict an image crack class, we need to create a python file called inference.py and keep it in the working directory.
If you look at the code above, you may find that this part is very similar to the training file. Here we imported the same libraries that we imported for training. After importing the libraries, we will create our helper functions that will help us with the inference.
We will randomly select some samples from the validation data directory for prediction. In line #7, we created a function called _randomimage(). In line #17, we also created another function called _saveimage() to save our output results in the disk.
So, we are ready with our helper functions. Now we need a function that takes the input image and then gives us the prediction result through our trained network. To do this, we will create a function called inference().
In the above code, line # 2 is our test dataset’s directory. In our test directory, there are two subfolders: Positive and Negative. We can take our test samples either from Positive or from Negative folder. In line #3 we define from where we want to choose our test sample images. Notice that, in line #3, there is an argument p.type that passes this information of Positive or Negative folder. At this point look at line #25, we called an ArgumentParser(). This function will pass an argument while the bash command is given. In line 26, we added an argument type. This means that we want to pass on the information type to the bash. By default, the information is Positive. Then we named this parser argument as p in line # 28. That means that p holds the information type. Inside the type, it has the information Positive. When you run a command in the command line prompt/bash, it will be clearer for you. Okay, now let’s get back to the function inference(). The rest of this is simple, line #7–9, we compiled our model and loaded the trained weight. Then, in line 11, we created a list of random images with the helper function _randomimage() that we created earlier. In line #14, we transformed our test images and stacked together to pass through our model. With line #16 we pass the images through our model and get our prediction logit. The next line creates a softmax probability for test images. Now we have our prediction probability in hand. Then we are going to save the results with line #20.
We are done with the coding to visualize the prediction result. To run the code, enter the following command in your terminal. It will take 9 samples from Positive folder, produce results, and save it in your disk.
Note that, we entered type as Positive. As a result, it takes samples from the folder Positive. If we have entered type as Negative, it would have taken samples from the folder Negative.
When you run the command, you will get something like the image below. It is a grid of 9 images that have passed through our network. The prediction probabilities are on top of each image. The results are OK, but you can easily improve performance by adopting transfer learning.

2. Visual Explanation
Now, we are going to generate visual explanation heat maps. To generate the heat-maps we are going to use Grad-CAM[1] algorithm. The heat-maps identify the image regions that influence the network’s decision. If we look at the heat map, we can easily understand which image pixels contribute to the network’s decision.
To work with grad-cam, type the following in your terminal to install grad-cam in your local machine:
Now, from this github repository, download the folder _pytorch_gradcam and put it in your working directory. You can see that, there are source code for different variants of grad-cam inside that folder. Once you have kept this folder in your directory, you are ready to use grad-cam and some of its variant algorithms.
Now you are all set to use grad-cam to generate heat-maps. At this point, you need to create a python file xai.py in the main working directory.
Inside the xai.py, first let’s import the required libraries and our model:
Then, we need to import the grad-cam libraries:
In the above code, notice that from the folder _pytorch_gradcam we imported Grad-CAM and different variants of it. We also imported some of the packages for image visualization.
The above part is pretty straightforward. In line # 6, we just saved all the imported methods name inside a dictionary. We can call any of the method from here later to produce heat maps. Line # 19 is our helper function for random image selection.
Then, we need to create our main wrapper function xai() for generating our heat maps.
In line #2–4, we compiled our model and loaded the trained weight. We need to select a target layer from where gradients will be used to produce heat maps. With line # 6, we set the target layer of our convolutional neural network which is the last layer of the ResNet in our architecture.
We also need a target category for heat map generation. We have only two classes. For negative class our _targetcategory is 0 and for positive class it is 1. We defined these in line #13–16.
Then we need to select a grad-cam method that we want to use. We do this in line #18 and wrap it up in line #19. Then in line #22, we loop through our images to create heat maps using our chosen method.
To run the code, you need to type the following in your terminal:
Here, notice that we gave two arguments, one is type and the other one is method. With the type we are selecting positive samples from our test dataset and with method we are setting a method for our heat map generation. Here, we are setting gradcam as our method. You can select any other method, eg. Xgrad-cam, Eigen-cam, etc. as well.
If you run the command you will get something like the below image. It is a grid of 9 images with the heat-maps. Here, the hot zone indicates that they are responsible for the network’s decision of Positive class.

If you look at these images, you can see that hot zones overlap with cracks in the image. That means our network is correctly looking at the right location which is the location of cracks to take its decision.
Conclusion
In this part, you have seen how to infer using your trained model to predict the class of cracks on different images. You have also seen how to use a grad-cam tool to generate heatmaps that can explain the networks’ decisions. Looking at the heatmaps, you can easily understand how your Deep Learning algorithm makes decisions. You can easily extend this example and use it in any image classification problem to explain your network’s decision.
You can find the complete code in Github. If you have not checked out the previous part yet, find it here.
References:
[1] Ramprasaath R. Selvaraju et al., 2019, "Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization", https://arxiv.org/pdf/1610.02391.pdf