RoboSomm

Wine Embeddings and a Wine Recommender

Quantifying the Sensory Profile of 150,000+ Wines and Building a Wine Recommender Model

Roald Schuring
Towards Data Science
10 min readMay 30, 2019

--

One of the cornerstones of previous chapters of the RoboSomm series has been to extract descriptors from professional wine reviews, and to convert these into quantitative features. In this article, we will explore a way of extracting features from wine reviews that combines the best of the existing RoboSomm series and academic literature on this topic. We will then use these features to produce a simple wine recommendation engine.

The Jupyter Notebook with all relevant code can be found in this Github repository. Our dataset consists of roughly 180,000 professional wine reviews, scraped from www.winemag.com. These reviews span roughly 20 years, dozens of countries and hundreds of grape varieties.

Wine Embeddings

In the following section, we walk through the five steps required to create our ‘wine embeddings’: a 300-dimensional vector for each wine, summarizing its sensory profile. On the way, we will explain successful approaches others have taken in similar projects. Before we proceed, let’s pick a wine to join us on this journey:

Point & Line 2016 John Sebastiano Vineyard Reserve Pinot Noir

Review: Dried red flowers and sagebrush combine for an elegant aromatic entry to this bottling by two business partners who have worked in Santa Barbara’s restaurant scene for many years. Tarragon and intriguing peppercorn flavors decorate the tangy cranberry palate, which is lightly bodied but very well structured.

Excellent! Time to get stuck in.

Step 1: Normalize words in wine review (remove stopwords, punctuation, stemming)

The first step is to normalize our text. We want to remove stopwords and any punctuation from our raw text. In addition, we will use a stemmer (Snowball Stemmer in Sci-Kit Learn) to reduce inflected words to their stem. The Pinot review becomes the following:

dri red flower sagebrush combin eleg aromat entri bottl two bus partner work santa barbara restaur scene mani year tarragon intrigu peppercorn flavor decor tangi cranberri palat light_bodi veri well structur

Step 2: Enhance the set of normalized words with phrases (bi-grams and tri-grams)

Next, we want to account for the possibility that some of the terms we want to extract from the wine descriptions are actually combinations of words or phrases. Here, we can use the gensim package Phrases to produce a set of bi- and tri-grams for the full corpus. Running our normalized wine review through the phraser consolidates terms such as ‘light’ and ‘bodi’ which are frequently found next to each other to ‘light_bodi’:

dri red flower sagebrush combin eleg aromat entri bottl two bus partner work santa_barbara restaur scene mani_year tarragon intrigu peppercorn flavor decor tangi cranberri palat light_bodi veri well structur

Step 3: Use the RoboSomm wine wheels to standardize the wine descriptors in each review

Wine reviewers are often creative in their use of language, and sometimes use different words to describe things that are seemingly the same. After all, are ‘wet slate’, ‘wet stone’ and ‘wet cement’ aromas not really manifestations of the same sensory experience? In addition, wine tasting has specific jargon. Terms such as ‘baked’, ‘hot’ or ‘polished’ have a specific meaning in the world of wine tasting.

To standardize wine jargon and creative descriptors, researchers such as Bernard Chen have developed the Computational Wine Wheel. The Computational Wine Wheel categorizes and maps various wine terms that appear in wine reviews to create a consolidated set of descriptors. This great work, together with the contributions of others (e.g. Wine Folly and UC Davis) has been used to generate the RoboSomm wine wheels. These wine wheels were created by looking at a list of the most frequently occurring descriptors in the corpus after going through steps 1 and 2 outlined above. This list was then reviewed manually, and mapped onto a set of standardized descriptors. In total, this resulted in a mapping for over 1,000 ‘raw’ descriptors.

The first of the RoboSomm wine wheels is an aroma wheel, that categorizes a variety of aromatic descriptors:

Wine Aroma Wheel

The second wine wheel is a non-aroma wheel, that accounts for other characteristics, such as body, sweetness and acid levels. These descriptors are not typically included in tasting wheels, but are prominent parts of a tasting experience:

Wine Non-Aroma Wheel

We can choose to standardize wine terms at any of the three levels of the wheel, or use the raw descriptor itself (no standardization). For now, we will map the descriptors to the outside layer of the wheel. For the Pinot Noir review we started processing, we obtain the following:

dry red flower sagebrush combin elegant aromat entri bottl two bus partner work santa_barbara restaur scene mani_year tarragon intrigu pepper flavor decor tangy cranberry palat light_bodied veri well structur

Note that all the descriptors that have been mapped are highlighted in bold. The other terms are either non-informative or ambiguous in the context of this analysis.

Step 4: Retrieve the Word2Vec word embedding for each mapped term in the review

Next, we need to consider how we will quantify our set of mapped descriptors. A common approach to doing this (and one that was used in previous chapters of the RoboSomm series!) is to represent the absence/presence of each descriptor in the corpus with a 0 or a 1. However, this approach does not take into account semantic (dis)similarities between terms. Tarragon, for instance, is more similar to sagebrush than it is to cranberry. To account for this, we can create word embeddings: vector representations of words and phrases. Researchers such as Els Lefever and her co-authors have taken a similar approach to quantifying wine reviews in their work.

For the purpose of this project, we will use a technique called Word2Vec to generate a 300-dimensional embedding for every mapped term. Since wine jargon is so specific, we have to train our Word2Vec model on a representative corpus. Fortunately, our set of 180,000 wine reviews is exactly that! Having previously mapped our descriptors using our wine wheels, we have already somewhat standardized the wine terms in our corpus. This was done to eliminate unnecessary semantic nuance (e.g. consolidate ‘wet stone’, ‘wet slate’ and ‘wet cement’ to ‘wet rock’), hopefully enhancing the quality of our Word2Vec model.

Our trained Word2Vec model consists of a 300-dimensional embedding for every term in our corpus. However, we can recall from the previous step in this analysis that we only really care about the terms that are relevant descriptors of a wine’s sensory experience.

For our Pinot Noir, these were:

dry, flower, sagebrush, elegant, tarragon, pepper, tangy, cranberry, light_bodied

In the adjacent image, we can see the word embedding for each of these mapped descriptors.

Step 5: Weight each word embedding in the wine review with a TF-IDF weighting, and sum the word embeddings together

Now that we have a word embedding for each mapped descriptor, we need to think about how we can combine these into a single vector. Looking at our Pinot Noir example, ‘dry’ is a fairly common descriptor across all wine reviews. We want to weight that less than a rarer, more distinctive descriptor such as ‘sagebrush’. In addition, we want to take into consideration the total number of descriptors per review. If there are 20 descriptors in one review and five in another, each individual descriptor in the former review probably contributes less to the overall profile of the wine than in the latter. Term Frequency-Inverse Document Frequency (TF-IDF) takes both of these factors into consideration. TF-IDF looks at how many mapped descriptors are contained within a single review (TF), as well as at how often each mapped descriptor appears in the 180,000 wine reviews (IDF).

Multiplying each mapped descriptor vector by its TF-IDF weighting gives us our set of weighted mapped descriptor vectors. We can then sum these to obtain a single wine embedding for each wine review. For our Pinot Noir, this looks something like:

Building a Wine Recommender

Now that we have our wine embeddings, it’s time to have some fun. One of the things we can do is produce a wine recommender system. We can do this by using a nearest neighbors model, which calculates the cosine distance between various wine review vectors. The wine embeddings that lie closest to one another are returned as suggestions.

Let’s take a look at what we get as suggestions when we insert our Point & Line Pinot Noir from earlier. Which of the 180,000 possible wines in our dataset are returned as suggestions?

Wine to match: Point & Line 2016 John Sebastiano Vineyard Reserve Pinot Noir (Sta. Rita Hills)
Descriptors: [dry, flower, sagebrush, elegant, tarragon, pepper, tangy, cranberry, light_bodied]
________________________________________________________________
Suggestion 1: Chanin 2014 Bien Nacido Vineyard Pinot Noir (Santa Maria Valley)
Descriptors: [hibiscus, light_bodied, cranberry, dry, rose, white_pepper, light_bodied, pepper, underripe, raspberry, fresh, thyme, oregano, light_bodied, fresh]

Suggestion 2: Hug 2016 Steiner Creek Pinot Noir (San Luis Obispo County)
Descriptors: [fresh, raspberry, thyme, pepper, rosemary, sagebrush, dry, sage, mint, forest_floor, light_bodied, cranberry_pomegranate, tangy]

Suggestion 3: Comartin 2014 Pinot Noir (Santa Cruz Mountains)
Descriptors: [vibrant, tangy, cranberry, hibiscus, strawberry, pepper, brown_spice, pepper, spice, bay_leaf, thyme, herb, underripe, raspberry, cranberry, fruit]

The top three wines returned are all Pinot Noirs from California. Looking at the descriptors for these wines, we can see that they are indeed very similar to our original wine. Cranberry features in every one of the suggestions. Because of the way the wine embeddings have been constructed, the semantic similarity of non-identical terms is also taken into consideration. For example, the word ‘flower’ in the original wine review is similar to ‘hibiscus’ and ‘rose’ in the first suggestion.

If we look at the top ten wine suggestions for our Point & Line Pinot Noir (see this Jupyter Notebook for the full list), we can see that the recommendations are remarkably consistent. All ten wines come come from California, and nine out the ten are Pinot Noirs. Five are even produced within a 60-mile radius of our original wine. The only wine that is not a Pinot Noir is a Cabernet Franc from the Santa Ynez Valley, a mere 25-minute drive from where our Point & Line Pinot is produced. The geographical origin of our Pinot Noir wine appears to have a very strong effect on its sensory profile, allowing for it to be matched with other similar wines in its direct vicinity. The adjacent map illustrates just how geographically concentrated our wine recommendations are.

The remarkable performance of this recommender model does beg the question: how is it possible that the suggestions returned are so specific to a single geographical area?

At its core, this analysis is entirely dependent on the wine reviews used to construct the wine embeddings. In this post, a taster for the Wine Enthusiast explains how wines are rated on the www.winemag.com website. Although ratings are given through a process of blind tasting, it is not entirely clear whether the text description in the review is also a product of an unbiased evaluation process. It is possible that reviewers, having seen the bottle, consciously or unconsciously attribute certain terms to specific types of wine (e.g. ‘sagebrush’ for Pinot Noirs from Southern California).

On the other hand, it is also entirely possible that these wines truly exhibit sensory profiles that can be attributed to specific grape varieties, terroirs and wine-making styles. The professional reviewers from the Wine Enthusiast may well have such finely-tuned palates that they can pick out these nuances in each wine, without having seen the bottle.

Using Descriptors to Suggest Wines

As a final exercise, we can take a slightly different approach to leveraging our wine recommender. Let’s say that we are looking for a wine with specific characteristics. On a hot summer’s day, we might feel like a wine that is fresh, high in acid, and has aromas of grapefruit, grass and lime. Taking the RoboSomm wine wheels for a spin, we can pick the descriptors that match these characteristics: ‘fresh’, ‘high_acid’, ‘grapefruit’, ‘grass’ and ‘lime’.

Feeding these descriptors into the wine recommender, we get the following suggestions:

Suggestion 1 : Undurraga 2011 Sibaris Reserva Especial Sauvignon Blanc (Leyda Valley)
Descriptors: [minerality, zesty, crisp, grass, lime, grapefruit, lemongrass, angular]

Suggestion 2 : Santa Rita 2012 Reserva Sauvignon Blanc (Casablanca Valley)
Descriptors: [snappy, pungent, gooseberry, grapefruit, lime, racy, lime, grapefruit, nettle, pith, bitter]

Suggestion 3 : Luis Felipe Edwards 2015 Marea Sauvignon Blanc (Leyda Valley)
Descriptors: [punchy, grass, citrus, tropical_fruit, fruit, angular, fresh, minerality, tangerine, lime, lemon, grapefruit, tangy]

All three of the wine recommendations are Chilean Sauvignon Blancs, with two coming from the Leyda Valley. Once again, it is noteworthy how geographically concentrated the suggestions are. Especially considering that the wine recommender has 180,000 different wines to choose from!

Conclusion

There is no shortage of ways in which we can use our wine embeddings. Our simple wine recommender model suggests that it may be worth further investigating wine styles through the lens of geography. What is the influence of terroir vs. wine-making style? Do geographical differences establish themselves in the same ways for different grape varieties? Perhaps we can also learn more about the process by which wine reviews are written and the extent to which biases drive the use of certain descriptors.

--

--