The world’s leading publication for data science, AI, and ML professionals.

How I used data science to select champagne for an important event

Warning: You might feel thirsty after reading this article

Image source - wine-searcher
Image source – wine-searcher

I had an important event coming up. The guest list included connoisseurs of fine Champagne. There was no room for error in champagne selection. I had two methods on how to go about selecting champagne.

  1. The usual way – go in a champagne cave, taste a few, ask for advice from cave owner and select what I liked. And what I will get is what is available in the cave. Obviously no cave in the world has all the 10K+ champagnes in the world.
  2. The Data Science way – use data and algorithms to help in selection.

I already had a bad experience with method 1 earlier. Either I do not have a good champagne taste, or the cave owners wanted to sell on what champagne they wanted to promote.

So I decided to go for method 2. The other advantage of method 2 was that I can impress my guest with good data story if everything worked out well. Some cutting-edge discussion on AI and data science over a glass of champagne sounds very cool.

So here is my journey of selecting champagne using some cool data science stuff.

Before we start, just a short credit to all sources I have used: Champagne database – https://www.wine-searcher.com

Where to start

As scotch is always from Scotland, champagne is always French. The reason is that there is region in France called Champagne which produces most of the champagnes of the world. The number of champagne are well beyond thousands. So the first step was to make a list of all champagnes. Fortunately there exists a champagne database (available at https://www.wine-searcher.com), which is updated monthly.

I was able to download a list of 10000+ champagnes. Quite a long list. A screenshot of the data is shown here.

Snapshot of champagne database download.

The database had some good information such as

Grapes – The types of grapes gives a very good indication of taste and feel of a champagne.

Popularity – This indicator is kind of a sales rank and also indicates how often it is bought by customers

Score – This is an indicator of how expert champagne tasters have rated the champagne

Price – price in euro for a 750ml bottle of champagne

I think I had good information to get started with applying data science.

Deciding a selection strategy

The next step was establishing a selection strategy. Tasting all 10K+ champagnes was a nice idea, but not feasible. So here are two basic strategies which I thought of

  1. See if there are rare ones in data – like high score or highly popular, but low price.
  2. Make groups of similar champagnes based on columns and select from each group. This will ensure a good overall selection.

Strategy – Looking for the Rare Ones

I decided to see if there are rare ones using three fields – Popularity, Score, Price. But first it was important to see if any of these field do not correlate. I had this thought that what if low-prices champagnes are popular? Is there any correlation between Popularity and Price ? If yes, then I just need to select one of them.

And it shows a correlation value of -0.16 , which means that there is no significant correlation.

Price vs Popularity correlation
Price vs Popularity correlation

So now I can safely use the three fields as there is no overlap between them.

In data science terminology, rare means outliers. Finding outliers means finding something un-common combination of Popularity, Score, Price. It is a complex problem to find un-common combination across 10K+ row and 3 fields. Fortunately with power of data science, we can use Outlier detection algorithms.

Here is the result of outlier detection.

Outlier detection
Outlier detection

The algorithm found 30 outliers. It means I am able to shortlist 30 champagnes out of 10K+ champagnes. Wow! This result was mind-blowing. I would have never managed to reduce a list of 10K+ champagnes to 30 manually using excel.

I can still reduce the list by now seeing what is the usual range and unusual values in individual columns. So here are the result of outlier analysis on individual columns.

For column Score , values above 94 are exceptional values.

For column Popularity, there are no exceptional values.

For column Price, values above 400 can be considered as very high price.

So within the outliers, still filtering by Score > 94 and Price < 400, I come down to 2 champagnes. Voila ! Here are my rare finds.

Hidden nuggets
Hidden nuggets
Image source - wine-searcher
Image source – wine-searcher

Then I searched on internet and I found exceptional reviews.

"One of the Top 1% champagne in the world". "At this quality, the price could be 3 times more".

I knew I had found some rare champagne using data science. Without data science, I would have spent countless hours on internet without any confidence on my search.

Now on to the second strategy.

Strategy – Make groups of similar champagnes

In this strategy, the plan was to reduce the list by making groups. This will also help me looking at few groups rather than 10K+ individual lines. Then I can select a few from the group.

This time I decided to use following fields Grapes, Popularity, Score, Price.

In data science, making groups is also clustering or segmentation.

First step was to determine how many groups should I make. I used clustering to make 2,3,4 and 5 groups and then see how all champagnes are distributed across the groups. So here are the result of analysis on number of clusters.

Cluster size analysis
Cluster size analysis

I decided to go with 3 clusters, as each cluster was significant. Here is the result of clustering with 3 clusters.

Clustering result
Clustering result

The results are interesting and let me explain. First, the output indicates that columns which are impacting the clusters are Popularity, Score, Price. This means that Grape types are evenly present in all clusters and do no really play a role in grouping the data.

Second I can see that three clusters , which mean the following

Just a note here that a low Popularity Number means highly ranked.

  • Cluster 0 – Champagnes with Low Popularity number (high rank) and Low Score
  • Cluster 2 – Champagnes with High Popularity number (low rank) and Low Score
  • Cluster 1 – Champagnes with High Score

I then decided to take Cluster 1 (High Score) and then filter out based on its Rank and take Price < 400 (like we did in first strategy). And I reach a short-list of about 12+ champagnes. And the beauty is that rare ones (Pierre Peters Grand Cru and Roederer Rosé) which I found in first strategy also made this list.

Also I see that I find all Grapes types, which is good, as my guest will have a good variety to choose from

Chardonnay is the world’s most famous white-wine grape. Food which go with it are Butternut squash risotto (risotto alla zucca), Japanese-style pork belly, Roast chicken with honey-sesame carrots

Pinot Noire essence is its aroma of red berries and cherry. Food which go with it are Pappardelle pasta with a porcini ragu, Roasted duck breast with plum sauce, Seared chicken livers on toast

Pinot Meunier (historically just Meunier) is a dark-berried grape variety. Food which go with it Tuna rillettes, Salt and pepper squid, Prawns steamed in banana leaves

Champagne Blend is a mix of above 3

Image source - wine-searcher
Image source – wine-searcher

Grape Types

I decided to go for a few of each one of the grape types.

So voila, here is my final choice taking into account strategy1 + strategy 2

Image source - wine-searcher
Image source – wine-searcher

The next step was to go on a tasting spree. Luckily I had a shortlist and I do not have to do some random tasting based on what sales person has to offer.

On the day of the event, I had all my champagnes , selected through data science, chilled and ready to be served. And all the guest really loved it !!! And they were all thrilled to know how I used some cool algorithms to make the choice.

Image source - wine-searcher
Image source – wine-searcher

Celebration time

Now I have become the go-to guy for champagne selection amongst my friends. I do it on one condition – I will use data science and I will do the tasting spree!!!

Cheers!!!

Additional resources

Website

You can visit my website to make analytics with zero coding. https://experiencedatascience.com

Please subscribe to stay informed whenever I release a new story.

Get an email whenever Pranay Dave publishes.

You can also join Medium with my referral link.

Join Medium with my referral link – Pranay Dave

Youtube channel Here is link to my YouTube channel https://www.youtube.com/c/DataScienceDemonstrated


Related Articles