Unsupervised Bayesian Inference (reducing dimensions and unearthing features)

Magic of the Sobel Operator

Low-level feature detection using edges

Aaron Dougherty

Published in

Towards Data Science

4 min readNov 11, 2020

Finally, the moment you have all been waiting for, the next in our unsupervised Bayesian inference series: Our (not so) deep dive into the Sobel Operator.

A truly magical edge detection algorithm, it enables low-level feature extraction and dimensionality reduction, essentially reducing noise in the image. It has been particularly useful in facial recognition applications.

The 1968 love child of Irwin Sobel and Gary Feldman (Stanford Artificial Intelligence Laboratory), this algorithm was the inspiration of many modern edge detection techniques. By convolving two opposing kernels or masks over a given image (e.g. see below-left) (each capable of detecting either horizontal or vertical edges), we can create a less noisy, smoothed out representation (see below-right).

Image by Author, generated using Scikit-Image

Sobel Operator implementation example by SciKit-Image altered by Author

Presented as a discrete differential operator technique for gradient approximation computation of the image intensity function, in plain English, the algorithm detects changes in pixel channel values (usually luminance) by differentiating the difference between each pixel (the anchor pixel) and its surrounding pixels (essentially approximating the derivative of an image).

This results in smoothing the original image and producing a lower dimension output, where low-level geometric features can be seen more clearly. These outputs can then be used as inputs into more complex classification algorithms, or as examples used for unsupervised probabilistic clustering via Kullback-Leibler Divergence (KLD) (as seen in my recent blog on LBP).

So how does it work?

Generating our lower dimension output requires us to take the derivative of the image. First, we calculate the derivate in both the x and y directions. We create two 3x3 kernels (see matrices), with zeros along the centre of the corresponding axis, twos in the centre squares perpendicular to the central zero and ones in each of the corners. Each of the non-zero values should be positive on top/to the right of the zeros (dependent on axis) and negative on their corresponding side. These kernels are named Gx and Gy.

These appear in the format:

Code by Author

These kernels will then convolve over our image, placing the central pixel of each kernel over each pixel in the image. We use matrix multiplication to calculate the intensity value of a new corresponding pixel in an output image, for each kernel. Therefore, we end up with two output images (one for each cartesian direction).

The aim is to find the difference/change between the pixel in the image and all pixels in the gradient matrix (kernel) (Gx and Gy). More mathematically, this can be expressed as.

Side note:

We are technically not convolving anything. Although we like to refer to ‘convolutions’ in AI and machine learning circles, a convolution would involve flipping the original image. Mathematically speaking, when we refer to convolutions, we are really calculating the cross-correlation between each 3x3 area of the input and the mask for each pixel. The output image is the overall covariance between the mask and the input. This is how the edges are detected.

Back to our Sobel explanation…

The calculated values for each matrix multiplication between the mask and 3x3 section of the input are summed to produce the final value for the pixel in the output image. This generates a new image which contains information about the presence of vertical and horizontal edges present in the original image. This is a geometric feature representation of the original image. Further, because this operator is guaranteed to produce the same output each time, this technique allows for stable edge detection for image segmentation tasks.

From these outputs, we can calculate both the gradient magnitude and the gradient direction (using the arctangent operator) at any given pixel (x,y):

Code by Author

From this we can ascertain that those pixels with large magnitudes are more likely to be an edge in the image, whilst the direction informs us about the edge orientation (although the direction isn’t needed for generating our output).

A simple implementation:

Here is a simple implementation of the Sobel operator in python (using NumPy) to give you a feel for the process (for all of you budding coders out there).

Code by Author

This implementation is a python variant of the Wikipedia pseudocode example, aiming to show how you might implement this algorithm, whilst also demonstrating the various steps in the process.

Why do we care?

Although the gradient approximation produced by this operator is relatively crude, it does provide an extremely computationally efficient method of computing edges, corners and other geometric features of images. In turn, it has paved the way for a number of dimensionality reduction and feature extraction techniques, such as Local Binary Patterns (see my other blog for more details on this technique).

Relying on exclusively geometric features does risk losing important information during the encoding of our data, however, the trade-off is what allows fast preprocessing of the data and in turn, fast training. Therefore this technique is especially useful in scenarios where texture and geometric features can be considered highly important in defining the input. This is why face recognition can be completed with a relatively high degree of accuracy using the Sobel operator. However, this technique may not be the best choice if trying to classify or group inputs which use colour as their main differentiating factor.

This method is still extremely effective for gaining a low-dimensional representation of geometric features and an excellent starting point for dimensionality and noise reduction, prior to using other classifiers or for use in Bayesian inference methodologies.

References:

[1] D. Kroon, 2009, Short Paper University Twente, Numerical Optimization of Kernel Based Image Derivatives.

[2] https://en.wikipedia.org/wiki/Sobel_operator

Unsupervised Bayesian Inference (reducing dimensions and unearthing features)

Magic of the Sobel Operator

Low-level feature detection using edges

So how does it work?

Side note:

Back to our Sobel explanation…

A simple implementation:

Why do we care?

References:

Written by Aaron Dougherty