Clustering AI-Generated Cocktail Recipes Using t-SNE and Plotly

Making Data Intuitively Explorable

Daniel Bojar
Towards Data Science

--

Scroll down for the interactive version!

Recently, I trained a recurrent neural network to generate cocktail recipes based on the style of the NYC bar Death & Co. And indeed, quite a few of these recipes featured curious combinations of ingredients which usually turned out great in practice. Yet while tasting some of these AI-generated cocktails was undeniably delicious, a systematic analysis of the cocktailAI-output would be highly beneficial. Even better (and planned for the future) would of course be the deployment of the cocktailAI as a webtool. As a consolation you get the next best thing here: hundreds of AI-generated cocktail recipes presented in a format which is intuitively explorable!

But before we get there, let’s first do a fun exercise and analyze the 500-ish cocktail recipes from the book ‘Death & Co: Modern Classic Cocktails, with More than 500 Recipes’ which I used as a database for cocktailAI. One way to do this is through clustering, grouping similar cocktails and separating dissimilar ones (we’ll leave open what ‘similar’ means for now). Given that a standard plot has two dimensions (the stereotypical x and y), how can we pick two variables from the potentially hundreds of cocktail ingredients in our database? Here, we need a tool known as dimensionality reduction which takes our data and tries to represent similarities between recipes in terms of their ingredients and amounts in just two dimensions. We’ll use t-SNE (t-distributed stochastic neighbor embedding) as a particularly powerful nonlinear dimensionality reduction tool. The data we receive from t-SNE can then be plotted in two dimensions and now we can start analyzing!

Interactive t-SNE plot of Death & Co cocktail recipes. Hover for cocktail name, drag-select for zoom

In t-SNE plots, clusters of points represent a grouping of similar data points (or cocktail recipes in this case). To facilitate identification, I took the liberty to color cocktails according to their base (or main type of spirit; defined here as anything above 22.5 ml) for the most common types of liquor. And as you can see amazingly clear, for most bases we can actually detect obvious clusters. In case you didn’t notice yet, this is plotted using Plotly so you can hover your mouse over a point and it will display the name and base of the respective Death & Co cocktail recipe (of course I can’t give you the recipe itself though, with it coming from a commercial source). Especially for gin cocktails, we can identify a clear cluster of similar recipes. If you look at whiskey cocktails, an interesting phenomenon can be observed. The three clusters labeled as ‘whiskey’ correspond to cocktails based on the main whiskey variants: bourbon, rye and scotch (plus smaller clusters for irish whiskey and Japanese whiskey, respectively). For tequila, we can notice a clean separation between cocktails relying on blanco tequila and reposado tequila, respectively. And of course there is an analogous behavior for the case of white and brown rum. Play around with the visualization and maybe the next time at Death & Co order an unusual whiskey cocktail which is not in any of the whiskey clusters!

Interactive t-SNE plot of AI-generated cocktail recipes. Hover for cocktail name+recipe, drag-select for zoom

So next we arrive at the even more interesting subject of the AI-generated cocktails! I gathered around 400 of them and gave them the same t-SNE clustering treatment. And this time, in the Plotly figure above, you actually get the recipe of every single one of the ~400 AI-generated cocktails in addition to its name when you hover over the point! For gin and tequila cocktails, we can see a similar picture to the original Death & Co recipes (absolute positions don’t really matter in t-SNE plots, only relative ones). Now rum cocktails are heavily biased for brown rum (especially prominent thanks to the abundance of brown rum in tiki cocktails which skewed the algorithm), and the few remaining white rum points curiously overlap with the cocktails declared as ‘other’. As for whiskey, a strong rye cluster can be identified and a somewhat smaller one for bourbon-based drinks. The absence of a scotch-themed recipe cluster is curious indeed. Let me know if you can spot any other patterns in the plots!

One reason why all of this is great is that you can specifically search for a recipe relying on a given base without having to scroll through a list or excel file. If you want to do the same with your list of cocktail recipes, you can find my code on GitHub. Additionally, if you’re feeling particularly adventurous, you can look for cocktail recipes located far away from their cluster and see if they have a funky list of ingredients! Keep in mind though that this AI-generated recipe database shown here is not curated. There may be required ingredients that don’t exist (yet) and there may be mistakes (both in spelling as well as conceptually). Additionally, some recipes which contain ‘two bases’ have two colors directly on top of each other which can be a bit confusing. So tread carefully in this exciting new world, brimful of discoveries to be made and bounties to harvest!

--

--

Machine Learning, Glycobiology, Synthetic Biology. Strong opinions, weakly held. Fascinated & Inspired by Counterintuitives. @daniel_bojar & dbojar.com