RoboSomm

Food and Wine Pairing

Unpacking and Matching the Characteristics of Food and Wine using Natural Language Processing

Published in

Towards Data Science

7 min readDec 23, 2019

Pairing wine with food is somewhat of a dark art. What ultimately makes for great pairings is a delicate balance between the body, non-aroma and aroma characteristics in the wine and in the food. In this article, we will use data science techniques and the prevailing theory on wine/food pairing to build a wine pairing engine. You can find the Jupyter Notebooks with all relevant code here and here.

To accompany us on this journey, we will bring along a trusty and unassuming friend: the Chicago-style hotdog. This culinary mainstay of every sports venue in the greater Chicago area is typically accompanied by a plastic cup of watery beer. Can RoboSomm help us out and choose a wine for us to enjoy with this instead?

Extracting Food Characteristics

Before we can even think about a pairing, we need to pick apart the attributes of our hotdog. How can we quantify the non-aroma and aroma characteristics of our dish so that we can match it up with wines along these same dimensions?

Our first step here is to train a Word2Vec model to generate a 300-dimensional word embedding for a wide range of different food-related terms. Provided that we can find a sufficiently expansive and descriptive corpus of text to train this model on, we would expect these embeddings to capture variation in the characteristics of our food. Fortunately, the Amazon Fine Foods Dataset is exactly that: roughly 500,000 reviews for a plethora of food items.

After training this model, we can calculate an embedding for our hotdog. We will do this by breaking it down into its individual component pieces: hotdog, tomato, onion, pickle, relish, celery salt, sport peppers and mustard. We will assume that the fully loaded hot dog is the average of the word embeddings of these component ingredients.

Generating normalized non-aroma values for food (illustrated in two dimensions)

To quantify the non-aroma attributes, we will use the trained Word2Vec model to define an embedding for the non-aromas (body, sweet, acid, salt, piquant, fat and bitter). Then, we will calculate the cosine distance between that non-aroma embedding and a range of example foods, as in the example above for saltiness. This distance is normalized between 0 and 1 using a MinMax Scaler, with the minimum distance being the food that is closest to the salt embedding (bacon), and the maximum distance being the food that is farthest from the salt embedding (raspberry). A simplified version of this process has been shown in the diagram above. We can see that our hotdog ranks quite highly on the saltiness scale, with a value of 0.9.

Going through these steps for each non-aroma gives us a rough flavor profile for our hotdog. The most notable non-aroma is saltiness. There is also significant acidity, likely due to the relish, pickle and sport pepper. The hotdog is also spicy, fat and full-bodied but ranks low on bitterness and sweetness.

Extracting Aromas and Non-Aromas for our Wines

To create our wine pairings, we will also need to map out the aromas and non-aromas for different types of wine. The process by which this is done is very similar to the approach taken in a previous chapter of the RoboSomm series, with a couple of key differences. We extract information from roughly 150,000 professional wine reviews mined from www.winemag.com and train a Word2Vec model on the wine-related terms in the corpus. Importantly, we distinguish between different classes of terms, labeling whether each pertains to an aroma or a non-aroma. An example of this is outlined below.

For each wine, we can calculate an average embedding per category (TF-IDF weighted). In cases where a wine review does not contain information for all attributes, we plug in the average embedding of that non-aroma across all wines in our dataset.

Next, we group all of the wines in our dataset by grape variety and subregion (e.g. Chardonnay from Walla Walla Valley, Washington, USA). Any types of wine with fewer than 30 observations are discarded, as they are unlikely to contain sufficient information to accurately model all attributes.

To reduce the dimensionality of the non-aroma attributes, we apply PCA with one component. Since the descriptors within the non-aroma categories are quite uni-dimensional (e.g. low tannin vs. high tannin), we are able to capture the vast majority of the variation within these attributes in just this one dimension. The resulting scalars are normalized between 0 and 1 using a MinMax Scalar.

We are left with a dataset with 500 different types of wine, each with a 300-dimensional aroma vector, a scalar for body and six scalars for each of the other non-aroma attributes.

Pairing Wine with Food

Now for the fun part. What wines can we drink with our Chicago-style hotdog? We need to specify rules for how to match food with wine. We can use a 5-step process inspired by the wine pairing tips laid out in the fantastic book Wine Folly: Magnum Edition: The Master Guide.

(I) The body of the wine should roughly match the body of the food.

(II) Exclude any wines that have non-aroma attributes that are discordant (inharmonious) matches with the non-aromas in the food. Discordant matches are shown in the grey lines below.

(III) Use some rules of thumb to eliminate wines that do not match well (wine should be more acidic than the food, wine should be sweeter than the food, bitter wines do not pair with bitter foods).

(IV) Identify which pairings are congruent or contrasting. Congruent pairings have strong non-aromas that can also be found in the food. Contrasting pairings have strong non-aromas that are harmonious matches with the non-aromas in the food. These matches are denoted by the blue lines in the diagram above. Eliminate any pairings that are not congruent or contrasting.

(V) Rank the remaining wines by the closeness of the wine aroma embeddings to the food embedding. This ensures that food is matched to wines with similar nuanced flavors.

Results

So, what to drink with our Chicago-style hotdog? RoboSomm gives us four suggestions:

The pairing suggestions are very diverse, ranging from a New Zealand Pinot Noir to a Chardonnay from the Inland Valleys in California. All of the pairings have good acidity to match the sourness of the relish, pickle and sport pepper. The fattiness of the Chardonnay mirrors the fattiness of the food. The pairings are medium to full-bodied to match the body of our hotdog. The complementary wine aromas driving the pairing are pepper, smoke and spice; flavors that are reflected in the sausage of our hotdog.

Let’s take RoboSomm for another spin. What about a very different food, like a delicious appetizer with cucumber, smoked salmon, sour cream and dill?

This has a different profile from our hotdog, with a little more fattiness coming through from the salmon and the sour cream. The body is a little lighter due to the influence of the fish and the cucumber, but is still medium due to the presence of the sour cream.

Excellent! We have a couple of white wine options. The wines have a good amount of acid to complement the freshness of the food, and all exhibit some salinity to accompany the smoked salmon. The lemon aromas in the Alsace White Blend and the Israel Chardonnay are good matches with the cucumber and salmon, while the Sauvignon Blanc and New Zealand Pinot have herbal notes that underline the dill in our food.

Our meal wouldn’t be complete without a dessert. Let’s see what wines we could drink with a beautiful slice of peach pie.

The peach pie has a very different non-aroma profile than the foods we have tested so far, with lots of sweetness and fat, and some acidity coming from the fruit.

The wine suggestions are congruent pairings, reflecting the sweetness in our food and the acidity of the peach. Idaho Riesling appears to be a great match with its intense tree fruit aromas and racy acidity. The other suggestions have strong sweet notes too, accompanied by peach aromas to mirror the food.

Conclusion

RoboSomm is able to produce fairly sensible wine pairing suggestions using solely Natural Language-based techniques. This novel approach to generating pairings is promising and opens the door to more out-of-the-box ways of matching wine with food.

For now, let’s raise a glass (of Pinot Noir, Merlot or Chardonnay) to the Chicago-style hotdog.