GLCMs — a Great Tool for Your ML Arsenal

The theory of Gray Level Co-occurrence Matrices (GLCMs) and a practical application (with code!)

Martim Chaves
Towards Data Science

--

  • What are Gray Level Co-occurrence Matrices (GLCMs)?
  • What are they good for?
  • How can we apply them?

These are the questions that I will be answering in this article. Trust me on this one, I think GLCMs are cool, and I hope that by the end of this you’ll think the same. They can be used to extract texture features from images, as it will be done in the last section of this article. I am assuming that you have a general idea of what images are and some basic understanding of math.

The code used (later on) can be found in the following git repo:

So, let’s start with what are GLCMs.

For that we need an image and a positional operator — don’t worry about the latter for now.

Let’s look at a simple image, an image whose pixel values can be either 0, 1, or 2. Yes, it’s not a particularly exciting image. It also doesn’t seem to hold any relevant information whatsoever, but to explain this, it will do, so bear with me.

On the left, there is a small image with 3 tones of gray; On the right it’s the same image, but instead of the gray tones, there are the actual pixel values instead.
Example image (3 possible pixel values — 0, 1, and 2, from black to white) by author

Now, imagine that you’d like to know how often, for a given pixel, the pixel below it has the same value. You believe that this will give you an idea about the structure of this image. Maybe there are repeating patterns that will be evident once you assess this — who knows?

To do this, you start by creating a very simple table of sorts, with one row and three columns. In the first cell, you put the value for how many times a pixel with the value 0 has a pixel below it with value 0 as well. In the second cell, you do the same, but for the value 1, and in the third cell for the value 2. For our image, that process would look like this:

Constructing a simple preliminary table by author

I guess you could say that, maybe, this image has a lot of dark and bright spots. After all, it’s often that either the lowest or the highest values “appear together”, so to say. There also aren’t any situations where the mid-range value has the same value below itself.

But now you’re curious! Instead of wanting to know how often, for a given pixel, the pixel below it has the same value, you want to have a broader perspective.

You want to know, for each possible pixel value, what are the values that are usually one pixel below it? Are all the pixels with value 0 below pixels that also hold the value 0? Of course, we know that that is not the case, otherwise, there would be recognizable black vertical lines in the image. Even so, perhaps you can start to gain intuition on why the spatial relationship between pixel values could be relevant.

So, to do that, you create a matrix, matrix C. This matrix has three rows and three columns, one column and row for each of the possible pixel values (0,1,2). Considering the notation where, for C(i,j), i is the row and j is the column, C(0,0) will hold the value for how many times, for a pixel with the value 0, the pixel below is also 0. C(0,1) will hold the value for how many times, for a pixel with the value 0, the pixel below is 1, and the analogous will be done for C(0,2). So the i is sort of our “starting value”, and j is our “target value”. C(1,2) will hold the number of times the value 2 is below the value 1. Once you do that, this is what you get:

Image shown before with the described table below it.
Intermediate stage for the calculation of the GLCM by author

Looking at our matrix C, our next step is to normalize the matrix. First, you sum the values of the matrix, to get the total number of pairs. Afterward, you divide each value of C by that number — and there you have it, a normalized matrix. So now you have probabilities, a probability of a certain pair of values occurring in a given image, following a certain positional relationship.

C *is* a GLCM — congrats, you’ve made it!

Here’s a short video demonstrating this final process:

Final step for the calculation of the GLCM by author

Now, perhaps you’re wondering, why am I looking specifically at the relationship between a pixel and the pixel below it? Perhaps, you’d be more interested in knowing how the pixels’ values compare to the pixels on their right. Or on their left, or a pixel below them and to their right, so diagonally, or whatever comes to mind…

This is what the positional operator determines, and the positional operator is up to you to decide.

Consider the image as a matrix itself, matrix I, where i is the row and j the column. You can write the positional operator as a pair of numbers, for example (1,0). For each pixel I(i,j), you will be looking at the value of the pixel I(i+1,j+0), to fill in your matrix C.

For example, for the image above, consider the pixel I(1,2) — you can see that it holds the value 2. The pixel below it, I(2,2), also holds the value 2. So, you would add +1 to the value in the cell C(2,2), as the starting pixel value is 2, and the value of the target pixel, the one below the original one, is also 2.

Essentially, you can just add the values of the positional operator to the coordinates of the original pixel to get the values that you need to consider to fill in C. To completely fill in C, you have to do that for every pixel of the image.

Maybe, just maybe, you want to do something crazy. You want to capture the relationship between a pixel and the pixel that’s 8 units below it and 3 units to its left. Yeah, that’s crazy. For that, your positional operator would look like (8,-3). Now, perhaps this wouldn’t be the best positional operator. It’s not often that this spatial relationship is especially determinant for us to understand the content of an image. Nevertheless, the message here is that you are the boss of your positional operators.

So, now on to the next question — what can you do with this? What are GLCMs good for?

Essentially, you can analyze C — imagine you want to understand whether or not an image I is homogeneous.

Let’s look at the GLCM of I using a (0,1) positional operator. If I is very homogeneous, then the largest values of C will be found in the main diagonal, i.e. C(0,0), C(1,1), C(2,2). This is the case simply because this will mean that it’s often that two adjacent pixels have the same value. This in turn means that it is likely that there will be large regions with the same value — this is, an homogeneous image.

The opposite would be if the highest values of C were found in either the top right or bottom left cell. That would mean that it’s more common to have very distinct values together, which would mean that I is an image with a lot of contrast! We can reflect these ideas in a more quantitative way.

For example, we can use the following equation to calculate homogeneity:

Sum of each value in C divided by (1 + |i-j|).
Homogeneity equation

The homogeneity value that will result from this equation will only be large if the larger values of C are found in its main diagonal. For values outside of the main diagonal, the |i-j| expression will guarantee that the further away from the main diagonal, the less important those values will be.

Contrast can be calculated using the following equation:

Sum of each value in c multiplied by (i-j) to the power of 2.
Contrast equation

In this case, we have the opposite, the further away from the main diagonal of C, the higher the importance given to those values, since they are being multiplied by the (i-j)² expression.

There are other useful equations that can be used to analyze GLCMs. They operate in a similar way as the ones presented before.

For example, entropy:

Sum of the multiplication of the values of the GFLM by their logarithm.
Entropy equation

And, correlation:

Standard correlation equation.
Correlation equation

Where:

Definition of mean and standard deviation for the correlation equation.
Equations for elements of the correlation equation

Homogeneity, contrast, entropy, and so on are examples of possible features to use for machine learning. You could also use the GLCM itself, as an input to a convolutional neural network, for example.

GLCMs are usually associated with Texture. Texture can be seen as a repetition of visual patterns. Using an appropriate positional operator, it makes sense that GLCMs can be used to extract information about textures present in images. After all, a repetition of visual patterns can be seen as a repetition of combinations of values with a certain orientation.

This is an example of a pro of GLCMs — it’s great for textures! A con of GLCMs is that they are somewhat computationally expensive, especially if you intend on using several positional operators. That would require several matrices to be computed. Another important con to have in mind is that if you only use one positional operator, the resulting GLCM will not be invariant to feature rotation or scale changes.

Here’s a very simple algorithm to obtain the GLCM of an image (in python — the link to the git repo was shared at the beginning, this is in the file my_glcm/my_glcm.py):

Algorithm to calculate GLCM (log of GLCM is shown for visibility purposes) by author

Alternatively, you can use the “graycomatrix ”from the “skimage ”python library! You can look at the docs here:

  • To calculate the GLCM:
  • To calculate the properties of the GLCM (image features):

Note that they use a bit of a different system — the positional operator is determined by an angle and a distance, but the fundamentals are as described.

For curiosity’s sake, let’s look at the logarithm of the GLCM of two different images:

Image split into top half and bottom half. On top half, there is an image of the sky on the left, with it’s GLCM on the right. On the bottom half there is an image on the left of umbrellas and a brick wall, and its GLCM on the right.
a) Left image by Kate Palitava, on the right log(GLCM) of the left image by author; b) Left image by Parviz Besharat Pur, on the right log(GLCM) of left image by author. Note that the logarithm is only used to improve visibility.

It seems that for image a), the highest values of the GLCM are concentrated around the main diagonal. This isn’t surprising as the image doesn’t have any stark changes. It’s mostly a picture of the sky, and the intensities of the values change smoothly. You could say that the image is quite homogeneous.

When you compare the GLCM of a) with the GLCM of b), you can notice a clear difference. The values of the GLCM of b) are much more spread out, which makes sense. The shadows of the umbrellas against the brick wall, as well as the brick wall itself, introduce some contrast. Hopefully, by now, GLCMs are reasonably intuitive to you.

Finally, how can we apply them?

Let’s actually use GLCMs for a possible real-world application classifying whether or not a satellite image contains an oil palm plantation!

I thought that satellite images would be great to demonstrate what GLCMs could do. After all, the difference between the texture of images of mountains and oceans, for example, is noticeable. After some searching, I found this dataset on Kaggle, with satellite images of oil palm plantations. It was related to the Women in Data Science (WiDS) Datathon 2019, and I decided to give it a try. Here’s the link if you’re interested:

I started with a little bit of exploratory data analysis (EDA). For that, I looked at random images belonging to each class, to get a general idea of what were the main differences. It seemed that images with oil palm plantations contained clearly noticeable yellowish lines. These were probably roads used to access different plots of land, I assumed.

Example of Images *without* oil palm plantations:

Several images that don’t have oil palm plantations.
Different channels of Images without oil palm plantations by author

Example of Images *with* oil palm plantations:

Several images that have oil palm plantations.
Different channels of Images with oil palm plantations by author

To further confirm this, I looked at the mean image, and at the first “Eigen image”, of each class. These images all showed that, indeed, sat images of oil palm plantations contained a distinct texture from regular sat images. This texture could perhaps be described as clear thin lines crossing the image, often parallel to the axis of the image.

Mean image and Eigen image for both classes.
Mean and Eigen image for each class by author

Afterward, it was then time to extract some features of the texture of the images! To do that, first, the images were separated into their RGB and HSV channels.

The images of this dataset are RGB images. This means that the images are three-dimensional arrays. Besides the width and the height, there is a third dimension, which is due to the three different color channels, Red, Green, and Blue (RGB). These three primary color channels overlap and create the many different colors that we can see in an image. However, there are different systems (or color spaces) to represent color images. The HSV color space, which stands for Hue, Saturation, and Value is one of them. Hue is the pure color, the saturation is how rich that color is, and the value is how intense that color is (where the less intense, the darker it is). Please see the image below.

HSV color space by Michael Horvath. No changes were made. Used under the GNU Free Documentation License 1.2, which can be found here: https://www.gnu.org/licenses/old-licenses/fdl-1.2.en.html

This system is often used in computer vision, as it was designed to be similar to how humans interpret colors. Having the actual color be separate from how light or dark it is, allows, in some cases, for added robustness. Imagine, for example, a dataset where some images have shadows and others don’t. If the actual colors (hue) present in the images is vital information, then it makes sense to use HSV. Ultimately, it is a different way to represent color information, which may be useful or not, depending on the application.

In this case, looking at random satellite images of different classes, it seemed that also using the HSV color space made sense. Notice how the saturation channel seems somewhat consistently different between the two classes. For the class with no oil palm plantation, the saturation seems to have a distinct texture. I thought that, perhaps, the hue and value channel also contained some relevant information that was not immediately evident. For these reasons, the HSV color space was also used.

Averaging the RGB channels, the grayscale was also calculated.

Second, the GLCMs of these 2-dimensional arrays, the different RGB and HSV channels, as well as the grayscale, were calculated, using a (1,0) positional operator.

I thought that this positional operator was decent, given that horizontal lines were often found in oil palm plantations sat images. However, note that this may be fragile, as that would mean that the features are dependent on the orientation of the image. If the orientation were to change, then that could mean a reduced effectiveness of this positional operator. To account for this, more positional operators are recommended. For this article, only one was used for simplicity purposes, and to reduce computation time.

Finally, the homogeneity, contrast, energy, and correlation were calculated for each GLCM. Considering this, for each image, there were 28 features, 7 GLCMs times the 4 “measures” used.

28 features isn’t necessarily a lot, but it was still necessary to remove the ones that were not needed. This is to take into account the curse of dimensionality, and useless information. For that, initially, the correlation between the features was calculated. Secondly, only one feature was kept, from pairs of highly correlated features.

This reduced the pool of features quite a bit, to only 14 features. To further reduce this number, as that would also help with the computing time, the correlation between the remaining features and the label was calculated. Only features with at least a small correlation with the label were kept. Finally, only 8 features remained — these were the ones that would be used to train a model.

They were:

  • The homogeneity of the red channel
  • The contrast of the red, and saturation channel
  • The energy of the green, and red channel
  • The correlation of the red, hue, and saturation channel

Below you can find a pair plot with the 5 features with the highest correlation to the label. The first letter before the underscore represents the channel.

Pair plot for the 5 “best” images.
Pair plot for the five features with highest correlation with the label by author

Having chosen our features, it was time to pass them to a model and train it. After removing 10% of the data for the test set, the remaining data was split into 4 stratified folds — it was a highly imbalanced dataset. These folds were then used with 5 different classifiers, to determine which one was best. The classifiers used were:

  • Random Forest
  • Logistic Regression
  • Gradient Boosting
  • SVM
  • KNN

To determine which classifier was best, three metrics were calculated: balanced accuracy, AUC, and the F1-score. Ultimately, thinking of a possible real-world application, I decided to choose the classifier that had the best F1-score.

My idea here was that the consequences of not detecting oil palm plantations were not severe (unlike in the medical field, for example). At the same time, it would be best to discard most of the negative samples, this is, the ones with no plantations. That would mean a model that, when it detects an oil palm plantation, it is likely to be one — which is what the F1-score is all about.

Below you can find a graph with the results for each classifier. Each point corresponds to the mean of the different folds and the error bars are the standard deviations. The chosen model was the KNN.

Graph showing the metrics results for different classifiers.
Results for different classifiers by author

Nevertheless, this seems debatable. What do you think? Curious to know!

These results were obtained using pretty much the default hyperparameters set by sklearn. Curious to know if they made a significant difference, a random grid search was done, for three of the hyperparameters.

They were:

  • The number of neighbours to take into account (3 or 5)
  • The weight function used (‘uniform’ or ‘distance’)
  • How the distance was calculated (‘ Manhattan ‘ or ‘Euclidean’).

The results showed that the optimized classifier was 2.6% points (F1-score) better than the default one, with a final validation set F1-score of 75.09%.

The ideal parameters were 5 neighbors, using weights related to the euclidean distance. Weights according to the distance instead of being uniform means that closer neighbors have more influence regarding the final prediction.

Finally, all that was left to do was assess the model’s performance on the test set. The results were the following:

  • Balanced accuracy: 80.01%
  • AUC: 85.61%
  • F1-Score: 71.88%

You can find the confusion matrix of the test set and the ROC below.

ROC of the KNN.
ROC for the KNN by author
Confusion Matrix of the Test Set.
Confusion Matrix of the test set using the KNN by author

What to say about these results? Well, they are not amazing, but I do think it’s pretty incredible how 8 numbers can represent the information in an image that well! After all, pretty much all of the negative samples were accurately detected, while detecting more than half of the positive sample correctly.

The goal was also not to get the best results, but more to demonstrate that GLCMs are a valuable tool to have in your machine learning arsenal.

To do that we:

  • Looked at how GLCMs are calculated and why
  • How they can be quantitatively transformed into information
  • Used them in a practical scenario

I hope that I proved my point! What do you think?

I hope you enjoyed my first medium post! If you have any questions, feel free to ask them at mgrc99@gmail.com.

You can also connect with me on LinkedIn: https://www.linkedin.com/in/martim-chaves-081796186/

Thank you very much for reading :)

PS: I tend to use percentages for metrics — that’s just personal preference

--

--

Data and software enthusiast that’s into DnD and climbing. Connect with me on LinkedIn — martim-chaves-081796186!