Coffee Data Science

Extraction over the Life of the Coffee Bean

Data on extraction vs coffee roast age

Robert McKeon Aloe
Towards Data Science
9 min readOct 16, 2020

--

I’ve been collecting data on my shots for almost 2 years. I wanted to look through that information to see if I could see any trends, particularly the age of a roast. The main caveats are:

  1. My methods have improved over time.
  2. My tastes have changed.
  3. My tasting palate has expanded.

So I normalized the data to give a fairer comparison. I found taste and extraction improved over time up to 5 weeks post-roast. I did not find a taste drop off past three weeks, which is what is typically suggested by roasters. However, I don’t have too much data past 5 weeks because my roasts are usually around 1 lbs in weight and I drink too much espresso to have a roast hang around so long.

Roasting/Storage/Extraction

Roasting

I roast once a week using a Hottop roaster that I overfill to 350g. They say the max is 250g, but for a medium roast, you can go higher. I put the beans in when I start the machine instead of waiting for the drum to come to temperature. I’m aware that is not standard, and I’m okay with that. I have yet to do a side-by-side comparison of charge temperature, but I’m sure I will one day.

All images by author

When I started collecting data, I was ending the roast at 1 minute past the First Crack (FC). Now, I’m at 1:30 past FC. I used to blend beans before the roast, and now I do it after the roast.

Storage

Most of these beans were stored in plastic or glass jars that were sealed, but they were not vacuumed sealed. They were stored in a kitchen cabinet and were exposed to light. While people have said being exposed to light is not good for coffee, I haven’t seen data to support or reject that claim.

Extraction

I am a fan of lever machines, and the majority of these shots was pulled on the Kim Express. Some were pulled on La Pavoni, Flair, Kompresso, Enrico of Italy, and La Peppina. I started with a 10 second pre-infusion, but over the past two years, I’ve moved to a 30 second pre-infusion. I also started to pressure pulse a year ago, and many shots have been staccato or staccato tamped. Six months ago, I did quite a bit of work on warming the beans before grinding and cooling the grinds before brewing (aka Spicy Grinding). On top of all of this, I’ve been using paper filters for almost a year.

So there are quite a few changes, but I believe some normalization helps compare within roast changes as the roast ages.

Metrics of Performance

I used two metrics for evaluating the differences between shots: Final Score and Coffee Extraction.

Final score is the average of a scorecard of 7 metrics (Sharp, Rich, Syrup, Sweet, Sour, Bitter, and Aftertaste). These scores were subjective, of course, but they were calibrated to my tastes and helped me improve my shots. There is some variation in the scores. My aim was to be consistent for each metric, but some times the granularity was difficult and affected the final score.

Total Dissolved Solids (TDS) is measured using a refractometer, and this number is used to determine the percentage of coffee extracted into the cup in combined with the output weight of the shot and the input weight of the coffee, called Extraction Yield (EY).

Data

I ended up with 1200+ data points, and I had to clean the data before compiling the figures below. I threw out shots I had at cafes and a few shots from odd experiments.

The data table was challenging to put together because of Numbers (Mac). I storage my data in Numbers sheets where each column is an espresso shot. I used columns instead of rows because this made viewing this sheet on my phone easier. However, until recently, the maximum number of columns was a few hundred. So I had a few tables, and I have added and moved rows to improve workflow, so I had to make a common table that collected the important bits of all the other tables.

Not all the shots had EY, but I still wanted to look at the Final Score for as much data as possible. Below is a large view of a part of the table. I have input metrics, output metrics, and calculated metrics. I have color coding in there to be able to see very quickly how scores compare to one another. I also keep track of notes on the current shot and what to modify for the next shot.

Sample of the Size of the Data Sheet. All Images by Author

First, I plotted everything as a scatter plot, not normalized. I only split out Staccato and Regular, but there didn’t seem to be much of a difference except in EY. I haven’t regularly done Staccato shots for the past six months, and that has been the time I’ve been able to drive EY higher using longer pre-infusion. Final Score seems to go up over time as does EY.

You can observe that I don’t have much data for roasts older than 6 weeks. I have some data points because I went out of town for a month, and didn’t bring the beans with me.

Extraction time is high in the first week to two weeks but after that doesn’t change too much. I did notice past two months, doing a very hard tamp doesn’t have much effect; the shot will run fast.

I then normalized the data by roast using Z-score normalization. This normalizes all scores on the same distribution with a mean of 0 and a std of 1. So if points start to shift positive, it means the distribution is shifting. The issue with the scatter plots is that they don’t tell the story well.

So let’s move to box plots. This is a legend for those unfamiliar with box plots:

Here are the box plots without noramlization. They show a trend of increasing scores and EY over time. This could be misleading because at the beginning of this data journey, I would roast and drink a roast within two weeks. Now, my roasts sit at least 2 weeks before I brew.

Let’s normalize the data by each roast. There seems to be a shift upwards in the distributions for both Final Score and EY, but not as much for shot time. Shot time goes down especially past 7 weeks.

We can also do MinMax normalization to force all scores between 0 and 1. We can do this for each roast and then combine them. Higher score is better.

We see a trend towards higher scores for taste and EY using this type of normalization to combine data.

Looking at just the median for each box per day post-roast, they both follow rough trends, but there is some noise attributed to the days post-roast not being consistent. Not every roast has a data point for everyday. However, I wanted to try to see what I could do to understand the trends of taste and extraction post-roast.

Compared to Q-Scores

Do my taste scores align with Q-Scores? Luckily, I also have some data for that question. Over the past two years, I have also been collecting data on my roasts and the average grades. I assume blending beans would result in an averaging of the individual bean grades, and I want to know if this merged Q-score could give me a good indication of knowing that a roast will be good or not.

Here is a sample of my roast data sheet below:

Sample of Roast Data Sheet

Here is a sample of how I compute the average Q-score based on individual bean scores. The first two didn’t have names or the output weight.

Close up view of Roast Data Sheet

My roast blends ranged from 8.1 to 8.9 in Average Q-score. This is a histogram below to understand how the data is distributed.

Obviously, I can’t normalize shot data per roast, so instead, I normalized shot data across 3 to 5 roasts: 2 roasts prior (if they exist), the roast in question, and 2 roasts afterwards (if they exist). I normalized the scores using MinMax normalization.

Boxplots can give some information about the trend, but I have paid more attention to the Q-score in buying this past year, so it could be biased by what I’m buying.

Looking at MinMax normalizations, there seems to be a slight upward trend for Final Score but not EY.

We can simplify this look by looking at the average score per Q-score bin, which shows a very slight trend. However, the number of data points is very small, so I wouldn’t put much reliance behind it. I had hoped to see a clearer trend, but as is the case in data, the many variables can get in the way of each other.

Overall, I saw an upward trend for both taste and extraction yield as a roast aged, and I suspect there is a correlation between the two. I didn’t see as strong of a trend between Q-grade and taste, which is probably due to improvements in brewing. I would expect a trend to be clearer if more variables on technique were controlled.

I’m most excited that I can always do this analysis again, and I’m curious how more data could play into a better understanding of when roasts peak in flavor. The peak flavor time for me was between 3 and 5 weeks which is a more extended window than the typical 2 to 3 weeks as suggested by roasters. For extraction yield, I have not seen a degradation over time. Typically, EY continues to go up and hits a plateau.

For roasts past 5 weeks, there is a hint of staleness especially in the darker roasts and the roasts that went out to 3 months, but the shot was still enjoyable in part because the higher EY potential offset some of the stale flavors.

--

--

I’m in love with my Wife, my Kids, Espresso, Data Science, tomatoes, cooking, engineering, talking, family, Paris, and Italy, not necessarily in that order.