Fast Feature Engineering in Python: Image Data

Make your images more suitable to feed into ML systems

Published in

Towards Data Science

7 min readSep 16, 2021

“Finding patterns is easy in any kind of data-rich environment; that’s what mediocre gamblers do. The key is in determining whether the patterns represent noise or signal.”
― Nate Silver

This article is part 2 of my “Fast Feature Engineering” series. If you have not read my first article which talks about tabular data, then I request you to check it out here:

Fast Feature Engineering in Python: Tabular Data

Make your tabular data more suitable for feeding ML systems

towardsdatascience.com

This article will look at some of the best practices to follow when performing image processing as part of our machine learning workflow.

Libraries

import random
from PIL import Image
import cv2
import numpy as np
from matplotlib import pyplot as plt
import json
import albumentations as A
import torch
import torchvision.models as models
import torchvision.transforms as transforms
import torch.nn as nn
from tqdm import tqdm_notebook
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10

Resize/Scale Images

Resizing is the most fundamental transformation done by deep learning practitioners in the field. The primary reason for doing this is to ensure that the input received by our deep learning system is consistent.

Another reason for resizing is to reduce the number of parameters in the model. Smaller dimensions signify a smaller neural network and hence, saves us the time and computation power required to train our model.

What about the loss of information?

Some information is indeed lost when you resize down from a larger image. However, depending on your task, you can choose how much information you’re willing to sacrifice for training time and compute resources.

For example, an object detection task will require you to maintain the image's aspect ratio since the goal is to detect the exact position of objects.

In contrast, an image classification task may require you to resize all images down to a specified size (224 x 224 is a good rule of thumb).

After resizing our image looks like this:

Why perform image scaling?

Similar to tabular data, scaling images for classification tasks can help our deep learning model's learning rate to converge to the minima better.

Scaling ensures that a particular dimension does not dominate others. I found a fantastic answer on StackExchange regarding this. You can read it here.

One type of feature scaling is the process of standardizing our pixel values. We do this by subtracting the mean of each channel from its pixel value and then divide it via standard deviation.

This is a popular choice of feature engineering when training models for classification tasks.

Note: Like resizing, one may not want to do image scaling when performing object detection and image generation tasks.

The example code above demonstrates the process of scaling an image via standardization. There are other forms of scaling such as centering and normalization.

Augmentations (Classification)

The primary motivation behind augmenting images is due to the appreciable data requirement for computer vision tasks. Often, obtaining enough images for training can prove to be challenging for a multitude of reasons.

Image augmentation enables us to create new training samples by slightly modifying the original ones.

In this example, we will look at how to apply vanilla augmentations for a classification task. We can use the out of the box implementations of the Albumentations library to do this:

Gaussian Blur, Random Crop, Flip (Source)

By applying image augmentations, our deep learning models can generalize better to the task (avoid overfitting), thereby increasing its predictive power on unseen data.

Augmentations (Object Detection)

The Albumentations library can also be used to create augmentations for other tasks such as object detections. Object detection requires us to create bounding boxes around the object of interest.

Working with raw data can prove to be challenging when trying to annotate images with the coordinates for the bounding boxes.

Fortunately, there are many publicly and freely available datasets that we can use to create an augmentation pipeline for object detection. One such dataset is the Chess Dataset.

The dataset contains 606 images of chess pieces on a chessboard.

Along with the images, a JSON file is provided that contains all the information pertaining to the bounding boxes for each chess piece in a single image.

By writing a simple function, we can visualize the data after the augmentation is applied:

Now, let’s try to create an augmentation pipeline using Albumentations.

The JSON file that contains the annotation information has the following keys:

dict_keys([‘info’, ‘licenses’, ‘categories’, ‘images’, ‘annotations’])

images contains information about the image file whereas annotations contains information about the bounding boxes for each object in an image.

Finally, categories contains keys that map to the type of chess pieces in the image.

image_list = json_file.get('images')
anno_list = json_file.get('annotations')
cat_list = json_file.get('categories')

image_list :

[{'id': 0,
  'license': 1,
  'file_name': 'IMG_0317_JPG.rf.00207d2fe8c0a0f20715333d49d22b4f.jpg',
  'height': 416,
  'width': 416,
  'date_captured': '2021-02-23T17:32:58+00:00'},
 {'id': 1,
  'license': 1,
  'file_name': '5a8433ec79c881f84ef19a07dc73665d_jpg.rf.00544a8110f323e0d7721b3acf2a9e1e.jpg',
  'height': 416,
  'width': 416,
  'date_captured': '2021-02-23T17:32:58+00:00'},
 {'id': 2,
  'license': 1,
  'file_name': '675619f2c8078824cfd182cec2eeba95_jpg.rf.0130e3c26b1bf275bf240894ba73ed7c.jpg',
  'height': 416,
  'width': 416,
  'date_captured': '2021-02-23T17:32:58+00:00'},
.
.
.
.

anno_list :

[{'id': 0,
  'image_id': 0,
  'category_id': 7,
  'bbox': [220, 14, 18, 46.023746508293286],
  'area': 828.4274371492792,
  'segmentation': [],
  'iscrowd': 0},
 {'id': 1,
  'image_id': 1,
  'category_id': 8,
  'bbox': [187, 103, 22.686527154676014, 59.127992255841036],
  'area': 1341.4088019136107,
  'segmentation': [],
  'iscrowd': 0},
 {'id': 2,
  'image_id': 2,
  'category_id': 10,
  'bbox': [203, 24, 24.26037020843023, 60.5],
  'area': 1467.752397610029,
  'segmentation': [],
  'iscrowd': 0},
.
.
.
.

cat_list :

[{'id': 0, 'name': 'pieces', 'supercategory': 'none'},
 {'id': 1, 'name': 'bishop', 'supercategory': 'pieces'},
 {'id': 2, 'name': 'black-bishop', 'supercategory': 'pieces'},
 {'id': 3, 'name': 'black-king', 'supercategory': 'pieces'},
 {'id': 4, 'name': 'black-knight', 'supercategory': 'pieces'},
 {'id': 5, 'name': 'black-pawn', 'supercategory': 'pieces'},
 {'id': 6, 'name': 'black-queen', 'supercategory': 'pieces'},
 {'id': 7, 'name': 'black-rook', 'supercategory': 'pieces'},
 {'id': 8, 'name': 'white-bishop', 'supercategory': 'pieces'},
 {'id': 9, 'name': 'white-king', 'supercategory': 'pieces'},
 {'id': 10, 'name': 'white-knight', 'supercategory': 'pieces'},
 {'id': 11, 'name': 'white-pawn', 'supercategory': 'pieces'},
 {'id': 12, 'name': 'white-queen', 'supercategory': 'pieces'},
 {'id': 13, 'name': 'white-rook', 'supercategory': 'pieces'}]

We have to alter the structure of these lists to create an efficient pipeline:

Now, let’s create a simple augmentation pipeline that flips our image horizontally and adds a parameter for bounding boxes:

Lastly, we will create a dataset similar to the Dataset class offered by Pytorch. To do this, we need to define a class that implements the methods __len__ and __getitem__.

Here are some of the results while iterating on the custom dataset:

Thus, we can now easily pass this custom dataset to a data loader to train our model.

Feature Extraction

You may have heard of pre-trained models being used to train image classifiers and for other supervised learning tasks.

But, did you know that you can also use pre-trained models for feature extraction of images?

In short feature extraction is a form of dimensionality reduction where a large number of pixels are reduced to a more efficient representation.

This is primarily useful for unsupervised machine learning tasks such as reverse image search.

Let’s try to extract features from images using Pytorch’s pre-trained models. To do this, we must first define our feature extractor class:

Note that in line 4, a new model is created with all of the layers of the original save for the last one. You will recall that the last layer in a neural network is a dense layer used for prediction outputs.

However, since we are only interested in extracting features, we do not require this last layer. Hence, it is excluded.

We then utilize torchvision’s pre-trained resnet34 model by passing it to the ResnetFeatureExtractor constructor.

Let’s use the famous CIFAR10 dataset (50000 images), and loop over it to extract the features.

We now have a list of 50000 image feature vectors with each feature vector of size 512 (output size of the penultimate layer of the original resnet model).

print(f"Number of feature vectors: {len(feature_list)}") #50000
print(f"Number of feature vectors: {len(feature_list[0])}") #512

Thus, this list of feature vectors can now be used by statistical learning models such as KNN to search for similar images.

If you have reached this far then thank you very much for reading this article! I hope you have a fantastic day ahead! 😄

👉 Code used in the article

Until next time! ✋