Coffee Data Science

A Review of Coffee Data: Grades and Flavors

How flavors potentially impact coffee grading

Robert McKeon Aloe
Towards Data Science
7 min readNov 20, 2020

--

After trying a variety of coffees from around the world, I have often wondered how flavor differences affect grading (Q-grades or Cupping Grades). Even though I have found a general correlation between coffee grade and taste, I have really enjoyed even lower graded coffees. I have looked at two databases with coffee grades, and there are definitely regional differences, but I still didn’t have an idea how more specific flavors played a role.

So I amended one of the databases from Sweet Maria’s. I previously pulled their Q-scores, but they also had flavor ratings for each coffee. So I went back, and I pulled flavor ratings for all the beans. I ended up with a slightly larger database than previously at 407 coffees.

This is the first of three articles using this dataset. The next article will focus on using cupping grades and flavors to determine how similar coffees from different regions or processes are to each other. The third article will focus on comparing Sweet Maria’s cupping protocol to the SCA protocol using the CQI data base.

Cupping Scores (modified Q-scores)

Sweet Maria’s has a slightly different cupping criteria than the SCA criteria summarized below. It is curious to see how sweetness, uniformity, and clean cup compare to the other data. Where these 3 metrics for the SCA scale start out perfect and points are deducted, Sweet Maria’s metrics give a bit more insight into the coffee.

Data

Building this database was similar to any other database construction requiring time to clean up and check. I pulled the spider graphs for the Q-scores, and I used my previous code to extract sub-metric scores. For Flavors, I made a modification to allow easy extraction of those scores as well.

Used with Permission by Sweet Maria’s; All other images by author

I compiled this into a large table with metadata like region of origin and processing type. I combed through the data multiple times looking for discrepancies, and I sampled the data to check that the algorithm was doing a good job.

Finally, I was ready for some analysis.

Analysis: Flavor Distribution

I looked at the 12 flavor metrics and added an average. Nut and Floral flavors seemed the least common, but Sugars, Cocoa, and Body are more common.

Analysis: Correlation

Correlation is a metric to say how similar two variables are to each other. High correlation doesn’t mean one variable causes another variable, but that both variables go up or down the same when things change. I would assume from the start that some grading variables would have a high correlation because they are looking at taste from different points in time. Correlation could be positive (trend the same) or negative (trend inverse to each other). 0 means there is no correlation.

Obviously, the Cupping Scores (Q-scores, grades, however you’d like to call them) have a higher correlation to themselves than Flavors. Some interesting notes are that Caramel, Cocoa, Nut, and Rustic flavors are inversely correlated to most of the cupping scores. They are also negatively correlated more weakly to other flavors. There is a high correlation between Berry and Fruits which seems reasonable.

We can summarize this larger table by simply identifying the attributes with the highest correlation to each other. The negative sign indicates it is the absolute highest correlation value but a negative or inverse correlation (inversely proportional).

The immediate trend is that many of the Cupping metrics correlate to the Floral Flavor metric than other flavor metrics while many of the Flavor metrics correlate well to the Brightness Cupping metric. This is strange because the Floral flavor shows up only in 27% of coffees (it is zero otherwise).

We can break this correlation matrix down by regions and ask how each metric correlates to the Total Score (Cupping Score). There are some regional differences especially for fruits, citris, and berry. It’s odd for me since my experience with African beans is that they are more fruity, but fruity flavors don’t contribute much to high scores on African beans.

Part of these scores is also tied to how the coffee fruit is processed to make the green beans. I prefer dry process myself because it is more fruity, and the flavor scores below definitely show that. But they tend to have an inverse correlation to caramel. I would have thought Caramel to be similar to Sugars, but they don’t have much of a correlation.

The other piece seems that Nuts/Cocoa arE weekly correlated except to South American and Blends, which it has a strong negative correlation to Total Score.

The Nut and Cocoa are inversely correlated to the dry processed beans, but their correlation gets weaker for Honey and Dry. Overall, the more flavor in African beans, the better the overall cupping score. Cupper’s Correction is negatively correlated for Dry and Other Processed, which indicates to me that those get lower scores overall, but really they are great beans that aren’t quite quantified by the cupping metrics.

Analysis: Principle Component Analysis (PCA)

PCA is meant to transform a set of variables to a new space where the new dimensions are ordered by the amount of variability they compose of the original. A simple dataset could be reduced in the number of dimensions without loss of fidelity; in this case, each Q-score (of Sweet Maria’s grading) is represented by 11 dimensions, but maybe you don’t need so many to represent the total score. Maybe you only need three Principle Components or PC’s.

We can start by looking at all the variables. Unsurprisingly, the total Cupping Score is the most dominate in the first and most dominate variable. However, after that, it seems the flavor components are more dominate in distinguishing coffee beans.

We can perform this same analysis without the Total Score from the Cupping Scores. The Cupper’s Correction had an interesting impact. Flavors still dominate the impact on the individual PC’s.

We can further look at the impact of the Cupper Scores vs Flavors for the total absolute impact on the variability between coffees, and then we can make a cumulative impact based on what percentage of the data is represented by each PC.

We can separate Flavors and Cupping Grades to see how they compare. The Flavor metrics need more components to account for it’s variability. The flavor metrics hit 90% of variability in 8 PC’s out of 13 while the Cupping metrics hit 90% in 4 metrics out of 11. This indicates that the Flavor metrics are better able to uniquely identify coffees.

We can calculate the cumulative absolute impact of each metric across all PC’s and their contribution to the data. Cupper’s Correction is the most important while Flavor and Complexity are not as important. For Flavors, Sugars is the least helpful in distinguishing beans while Berry, Citris, Fruit, and Cocoa are the best.

We can plot all the data using the first 2 PC’s which contain most of the data variability. Coffees by grade seem more tightly together while the Flavors seem to be more spread out.

The spread of the flavor data is interesting compared to how close most of the coffee grades are together. It’s fun to see how African beans show between Dry and Wet processed. They seem to have the biggest variation.

In this work, I reviewed coffee grades (Cupping grades) to flavor grades. I found the flavor grades are much better as distinguishing coffees based on region or process, which is a good sign for coffee grades. Grading coffee should be independent of flavors, and the way Sweet Maria’s does cupping doesn’t show a strong bias towards particular flavors. The most bias is towards the Floral flavor, but it isn’t as strong as the correlation to other cupping parameters.

--

--

I’m in love with my Wife, my Kids, Espresso, Data Science, tomatoes, cooking, engineering, talking, family, Paris, and Italy, not necessarily in that order.