The Shape of Coffee

Particle Analysis using Pattern Recognition

Robert McKeon Aloe
Towards Data Science

--

To make coffee requires grinding, but which is the best grinder? I’m not sure, but what I have seen is quite a bit of data on coffee grind distribution. Some grinders are better than others simply because they have a tighter distribution, but is distribution the only thing to consider?

I initially explored the particle sizes because I had written code to analyze filter holes. The same processing could be used, but I found two main difficulties:

  1. Getting an image where the grounds weren’t touching is difficult.
  2. The technique only resolved down to 1 pixel for 100um, so it was difficult to get a better sense of the finer particles.

Additionally, the particles were not circles. I didn’t think much of this because I had other experiments running, but a year later, I got a microscope. It turns out, most grounds aren’t circular. They aren’t square either. So how does one make sense of them? How do you determine size of a particle? Radius doesn’t quite make sense, but area might.

If you measure with area for the size of the particle, could you have two grinders with very similar size distributions but different results in the cup? I was plagued by this question, and in trying to understand coffee grounds, I turned to pattern recognition.

In this work, I do a simple pre-processing, and then I apply some standard pattern recognition techniques. For each particle, I make a Linear Binary Pattern (LBP) feature which is scale and rotation invariant. Then I apply k-means clustering to determine the types of particles.

I don’t know if this will turn out to be another way to understand particle grinds, but I intend to experiment in the future for the following tests:

  1. Grind setting
  2. Sifted grinds
  3. Grinder Comparison
  4. Pre-shot vs post-shot grounds shape

Initial Processing

I put coffee either on a white piece of paper or an iPad screen. Then I take a picture. In this work, I’m not worried about scale, but that’s relatively easy to incorporate at a later date. The key here is the processing. Then I invert the image, and I take the product of R*G*B for each pixel. After applying a simple threshold, I get this image:

Here is a close-up image. Even in this image, you can see the variety of particle shapes.

Initially, I looked at some metrics like Major Axis to Minor Axis. The assumption is that the shape is an ellipse. There was definitely a distribution, and it was not linearly related. I suspect this is an indication of the variety of particle shapes.

I then compared the ratio of Major to Minor axis vs Area. Again, there is a spread, and it differs based on the hole area.

Pattern Recognition for Shape Analysis

Coffee particles aren’t circular. Do different grind settings or different grinders change the shape of the particles coming out? How could we understand the different shapes?

Linear Binary Patterns

One technique is to use Linear Binary Patterns. It is a simple pattern applied per pixel to generate a code as seen below by comparing the surrounding pixels to the center pixel with a simple equation (pixel > Center Pixel). There are 2⁸ possible patterns (256), but if you consider rotation around the center pixel, there are only 59 unique patterns. The final feature is 59 elements long of the percent of pixels in an image falling into each category.

k-Means Clustering

Afterwards, we could apply K-means clustering to see how the different types of shapes are categories. K-means clustering involves setting up K-nodes (in this case 16), and grouping all the features to their closest node. The algorithm then iteratively adjusts the nodes and grouping to best separate the data.

The two node example from Matlab

Selecting number of clusters is more of an art than a science. The main metric to use is the Within-Cluster Sum of Squares (WSS), and as you increase the number of clusters, there comes a point where you have diminishing returns, in this case, 16 clusters.

Now we can compare the particle areas across these different clusters both in a global view and then normalized across each Hole Area bin.

I’m not sure what this says, only that it is a new way to look at distribution and particle size. It’s a pretty graph, but at the end of the day, what could it tell us? Would this information be able to better differentiate good grinders from one another?

I’ve seen quite a few particle distributions by others online, but it hasn’t been clear to me how to use just that information to differentiate between the best, pretty good, good, bad, and the worst. You could probably tell a grinder is not great, but I feel like something is missing. My experience is that my $200 Rok grinder makes great espresso even though it is not considered a great grinder compared to the ones that are a few thousand dollars. Hopefully, we, as a community, will find some objective metrics so people will have an easier time dropping a few grand on a grinder.

--

--

I’m in love with my Wife, my Kids, Espresso, Data Science, tomatoes, cooking, engineering, talking, family, Paris, and Italy, not necessarily in that order.