Understanding how to handle Image Data for Computer Vision and Deep Learning problems

Bikram Baruah
Towards Data Science
7 min readJan 30, 2022

--

Image by Bud Helisson from Unsplash

Introduction:

After having worked on multiple computer vision and deep learning projects over the past couple of years, I’ve gathered my thoughts in this blog on how to handle with image data. Almost always it is better to pre-process the data rather than feed it straight to a deep learning model. At times, a deep learning model might not even be required, and a simple classifier might be enough after some processing.

Maximizing the signal and minimizing the noise in the image makes the problem in hand a lot easier to deal with. Applying filters to enhance the features and making the image more robust to lighting changes, color etc. should be taken into account when building computer vision systems.

With that in mind, let us explore a few methods that can help solve classical computer vision or image-based deep learning problems. The notebooks that go along with this blog can be found in this repository.

1. Go simple before going deep:

Before applying the latest and the best deep learning to solve a problem, try classical computer vision techniques. Especially in cases where data might be scarce, as it is in many real world problems.

Check if calculating the statistical values of the image pixels such as mean, kurtosis, standard deviation results in statistically distinct values for the different classes. A simple classifier such as a SVM, KNN can then be trained on those values to classify between the different classes.

2. Increase signal in the image and remove noise:

Check if pre-processing techniques enhance the main features of the image and increase the signal to noise ratio before feeding them into a deep learning model. This will help the model achieve better accuracy.

  • Use techniques such as thresholding, noise removal techniques such as erosion and dilation, blurring techniques such as Gaussian blur (to smooth out edges) and Median blur (to remove salt and pepper noise)
  • Different operators might be useful in a different order for different problems.
  • It is common practice to use a particular operator more than once, a few steps after it was first applied, if that results in more enhanced features.

To help find the best combination of different kernels sizes for filters, thresholding values etc. (the combination of which can run into millions!) which will result in the best possible image, build interactive sliders to help find the ideal range of those values. An example of how this can be done has been given below in Point 3 and in this notebook

After that, try the method described in Point 1 and see if that gives you enough information for your task in hand.

3. Histogram equalization

Another way of enhancing the features of the image is using histogram equalization. Histogram equalization improves contrast of the image. The aim of histogram equalization is that the most frequent pixel values be evenly spread out and distributed.

Let us have a look at an example below.

Image by Joe Decker on Photocrati

As it can be seen, the image above has very low contrast. In cases such as this, it is important that the contrast be improved such that the features of the image are more clearly visible. OpenCV provides two techniques of doing so — Histogram Equalization and Contrast Limited Adaptive Histogram Equalization (CLAHE).

Applying histogram equalization, the contrast of the image does improve. However, it also increases noise in the image, as it can be in the middle image below.

This is where CLAHE comes in. Using this method, the image is divided into m x n grids, and then histogram equalization applied onto each of these grids.The ideal clip limit (threshold for contrast) and the tile grid size can be found using interactive sliders as shown below.

Interactive sliders to find the best clip and tile value. Image by author
From left to right: Original image, Histogram Equalized image, Image post CLAHE. Image by author

The right image now has enhanced contrast, and the background and foreground trees more visible compared to the original image. The full notebook for histogram equalization and CLAHE is available here

4. Converting the image to a different color space:

Converting the image into a different color space, such as HSV, can often give better information to segment the object, for cases such as object tracking. Typically, the RGB color space isn’t robust to shadows, slight changes in lighting (which influences the color of the object). For tasks such as object tracking using classical computer vision, oftentimes a finely tuned mask in the RGB space will fail later on when being used in a slightly different environment, because of the reasons listed above. Also, once the image is converted to a different space such as HSV, separating the channels often helps in segmenting the area of interest and removing noise. As it can be seen below, the shadow can be removed much easier and the tennis ball segmented once the image has been converted to HSV space and the channels split. The notebook on converting to HSV space and splitting the channels can be found here.

Different color spaces (RGB,HSV) and their components split. Image by author.

5. Normalizing the image:

In case the images are being fed into a deep learning model, the images must be normalized using techniques such as Batch Normalisation, which will help standardize the inputs to a network. This will help the network learn faster and be more stable. Batch normalization also reduces generalization error at times.

6. Making sensible augmentations:

When augmenting images, make sure that the augmentation techniques being applied preserve the class of the image and resembles the data encountered in the real world. For example, applying a cropping augmentation to the image of a dog might result in the augmented image not resembling a dog. This can also be true in the case of augmenting using rotation and flipping in some objects. Be very careful while changing the image properties such as color while augmenting. Also, make sure that augmenting the data does not change the label of the image.

Always check that the augmented images make sense and reflect the real world.

Example of how augmentation such as random cropping can lead to corrupted data. Left image by Oscar Sutton from Unsplash

7. Data leakage between the training and validation set:

This one is more for deep learning in general — but making sure that the same images (let’s say the original and the augmented image) are not in both the train and validation set is important. This usually happens when augmentation is performed before a train-validation split. Ignoring this might result in the model metrics giving a false representation of it’s true nature, as it will have learned from very similar images during training that are also present in the validation set.

8. Representation of all categories on the test/ validation set:

Make sure that the testing and validation set contains examples of all the labels. This will result in model metrics reflecting the true nature of the model

Take the case of where one of the labels has a significantly lower number of examples. Performing a random train-test split might result in the class with fewer labels not being represented in the validation/ test set at all. Alas, when the trained model is tested, it will not have been tested on that particular class, and the model metrics will not reflect the true nature of its performance. An example of how this can be done is found in this notebook.

9. Post-processing sanity checks:

It is also important to perform some sanity checks once the model has been trained:

  • Make sure that the outputs of all the classes in case of a multiclass classifier adds up to 1.
  • Make sure that pre-processing applied to the images during the training is also applied when the model is tested or deployed.

I hope this blogpost gave you some insights into how to deal with image data for classical computer vision or deep learning problems. Let me know if you have any questions or methods you use while going about handling image data. Feel free to connect with me on Twitter and LinkedIn. If you don’t wanna miss out on future blogposts from me, you can subscribe here to get them delivered straight to your inbox! Thank you for your time and have a wonderful day ahead! :)

--

--