The world’s leading publication for data science, AI, and ML professionals.

Finding Most Common Colors in Python

A standard, yet crucial, image processing task.

A standard, but crucial, functionality in image processing tasks

There are several use cases in Image Processing that can be solved if we know the most common color(s) of an image or object is. For example in the field of agriculture, we might want to determine the maturity of a fruit, an orange or strawberry for instance. We can simply check if the color of the fruit falls in a predetermined range and see if it is mature, rotten, or too young.

Photo by Sarah Gualtieri on Unsplash
Photo by Sarah Gualtieri on Unsplash

As usual, we can solve this case using Python plus simple yet powerful libraries like Numpy, Matplotlib, and Opencv. I will demonstrate several ways on how to find the most frequent color in an image using these packages.

Step 1 – Load Packages

We’ll load the basic packages here. We’ll load some more packages as we go along. Also, since we are programming in Jupyter, let’s not forget to include %matplotlib inline command.

Step 2 – Load and show sample images

In this tutorial, we will be showing two images side by side a lot. So, let’s make a helper function to do so.

Next, we’ll load some sample images that we’ll be using in this tutorial and show them using the function above.

Source: Images by Author
Source: Images by Author

Now we are ready. Time to find out the most common color(s) in these images.

Method 1 – Average

The first method is the easiest (but ineffective one) – simply find the average pixel values.

Using numpy‘s average function, we can easily get the average pixel value across row and width – axis=(0,1)

Most common color #1 - average method
Most common color #1 – average method

We can see that the average method can give misleading or inaccurate results, as the most common colors it gave are a bit off. This is because the average took into consideration all pixel values. This will be really problematic when we have images with high contrast (both "light" and "dark" in one image). This is much more clearer in the second image.

It gave us a somewhat new color that is not visibly clear/noticeable in the image.

Method 2 – Highest Pixel Frequency

The second method will be a bit more accurate than the first one. We’ll simply count the number of occurrences in each pixel value.

Fortunately for us, numpy again gives us a function that gives us this exact result. But first, we must reshape the image data structure to only give us a list of 3 values (one for each R, G, and B channel intensity).

We can simply use numpy ‘s reshape function to get the list of pixel values.

Now that we have the data in the right structure, we can start counting the frequency of the pixel values. We can just use numpy‘s unique function, with the parameter return_counts=True .

Done, let’s run it to our images.

Most common color #2 - frequency method
Most common color #2 – frequency method

This makes more sense than the first one right? The most common colors are in the black area. But we can go further. What if we take not just one most common color, but more than that? Using the same concept, we can take the top N most common colors. Except, if you look at the first image, many colors with the highest frequencies would most likely be neighboring colors, probably with a difference of a tiny few pixels.

In other words, we want to take the most common, different color clusters.

Method 3 – Using K-Means clustering

Scikit-learn package comes to the rescue. We can use the infamous K-Means clustering to cluster groups of colors together.

Easy, right? Now, all we need is a function to display the clusters of colors above and display it right away.

We simply create an image with a height of 50, and a width of 300 pixels to display the color groups/palette. And for each color cluster, we assign it to our palette.

Most common colors #3 - K-means clustering
Most common colors #3 – K-means clustering

Beautiful isn’t it? K-Means clustering gives great results in terms of the most common colors in the images. In the second image, we can see that there are too many shades of brown in the palette. This is most likely because we picked too many clusters. Let’s see if we can fix it by choosing a smaller value of k.

Yep, that solved it. Since we use K-Means clustering, we still have to determine the appropriate number of clusters ourselves. Three clusters seem to be a good choice.

But we can still improve upon these results plus still solve the number of cluster issues.

How about we also show the proportion of the clusters towards the whole image?

Method 3.1 – K-Means + Proportion display

All we need to do is to modify our palette function. Instead of using fixed steps, we change the width of each cluster to be proportionate to how many pixels are in that cluster.

Most common colors #3.1 - K-means clustering + proportions
Most common colors #3.1 – K-means clustering + proportions

Much better.

Not only it gives us the most common colors in the images. It also gives us the proportion of occurrences of each of the pixels.

It also helps answer how many clusters should we use. In the case of the top image, two to four clusters seem reasonable. In the case of the second image, looks like we need at least two clusters. The reason we don’t use one cluster (k=4) is that we’ll run into the same problem as the average method.

K-Means with k=1 result
K-Means with k=1 result

Conclusion

We have covered several techniques to get the most common colors in images using Python and several well-known libraries for it. Plus we’ve also seen the advantages and disadvantages of those techniques. So far, finding the most common colors using K-Means with k > 1 is one of the best solutions to finding the most frequent colors in images (at least compared to the other methods we’ve gone through).

Let me know if you have problems with the script in the comments, or in my Github.


Related Articles