A standard, but crucial, functionality in image processing tasks
There are several use cases in Image Processing that can be solved if we know the most common color(s) of an image or object is. For example in the field of agriculture, we might want to determine the maturity of a fruit, an orange or strawberry for instance. We can simply check if the color of the fruit falls in a predetermined range and see if it is mature, rotten, or too young.

As usual, we can solve this case using Python plus simple yet powerful libraries like Numpy, Matplotlib, and Opencv. I will demonstrate several ways on how to find the most frequent color in an image using these packages.
Step 1 – Load Packages
We’ll load the basic packages here. We’ll load some more packages as we go along. Also, since we are programming in Jupyter, let’s not forget to include %matplotlib inline
command.
Step 2 – Load and show sample images
In this tutorial, we will be showing two images side by side a lot. So, let’s make a helper function to do so.
Next, we’ll load some sample images that we’ll be using in this tutorial and show them using the function above.

Now we are ready. Time to find out the most common color(s) in these images.
Method 1 – Average
The first method is the easiest (but ineffective one) – simply find the average pixel values.
Using numpy
‘s average
function, we can easily get the average pixel value across row and width – axis=(0,1)

We can see that the average method can give misleading or inaccurate results, as the most common colors it gave are a bit off. This is because the average took into consideration all pixel values. This will be really problematic when we have images with high contrast (both "light" and "dark" in one image). This is much more clearer in the second image.
It gave us a somewhat new color that is not visibly clear/noticeable in the image.
Method 2 – Highest Pixel Frequency
The second method will be a bit more accurate than the first one. We’ll simply count the number of occurrences in each pixel value.
Fortunately for us, numpy
again gives us a function that gives us this exact result. But first, we must reshape the image data structure to only give us a list of 3 values (one for each R, G, and B channel intensity).

We can simply use numpy
‘s reshape
function to get the list of pixel values.

Now that we have the data in the right structure, we can start counting the frequency of the pixel values. We can just use numpy
‘s unique
function, with the parameter return_counts=True
.

Done, let’s run it to our images.

This makes more sense than the first one right? The most common colors are in the black area. But we can go further. What if we take not just one most common color, but more than that? Using the same concept, we can take the top N most common colors. Except, if you look at the first image, many colors with the highest frequencies would most likely be neighboring colors, probably with a difference of a tiny few pixels.
In other words, we want to take the most common, different color clusters.
Method 3 – Using K-Means clustering
Scikit-learn package comes to the rescue. We can use the infamous K-Means clustering to cluster groups of colors together.

Easy, right? Now, all we need is a function to display the clusters of colors above and display it right away.
We simply create an image with a height of 50, and a width of 300 pixels to display the color groups/palette. And for each color cluster, we assign it to our palette.

Beautiful isn’t it? K-Means clustering gives great results in terms of the most common colors in the images. In the second image, we can see that there are too many shades of brown in the palette. This is most likely because we picked too many clusters. Let’s see if we can fix it by choosing a smaller value of k.

Yep, that solved it. Since we use K-Means clustering, we still have to determine the appropriate number of clusters ourselves. Three clusters seem to be a good choice.
But we can still improve upon these results plus still solve the number of cluster issues.
How about we also show the proportion of the clusters towards the whole image?
Method 3.1 – K-Means + Proportion display
All we need to do is to modify our palette
function. Instead of using fixed steps, we change the width of each cluster to be proportionate to how many pixels are in that cluster.

Much better.
Not only it gives us the most common colors in the images. It also gives us the proportion of occurrences of each of the pixels.
It also helps answer how many clusters should we use. In the case of the top image, two to four clusters seem reasonable. In the case of the second image, looks like we need at least two clusters. The reason we don’t use one cluster (k=4) is that we’ll run into the same problem as the average method.

Conclusion
We have covered several techniques to get the most common colors in images using Python and several well-known libraries for it. Plus we’ve also seen the advantages and disadvantages of those techniques. So far, finding the most common colors using K-Means with k > 1 is one of the best solutions to finding the most frequent colors in images (at least compared to the other methods we’ve gone through).
Let me know if you have problems with the script in the comments, or in my Github.