The world’s leading publication for data science, AI, and ML professionals.

The Travelling Apothecary

A discrete optimization approach to Skyrim's alchemy system.

Image by Joshua Littles
Image by Joshua Littles

Hands-on Tutorials

Table of Contents

Introduction

Whether you are replaying the game for the 5th time or on your 15th character (unable to remember what you were doing with the other 14 saves when you last had free time), Skyrim still holds a warm spot in the hearts of many players. For me, this replayability does not extend to the alchemy system within Skyrim. Nearly a year ago, I had some free time over the holidays and was starting up another character. I was trying to find a way to avoid the grind of unlocking ingredient effects when I recognized this could be represented as an optimization problem.

I would like to share with you the premise behind this problem and the approach with explanations (written in R).

Premise

Skyrim is a role-playing game set in a medieval fantasy world. There are several mechanics the player can master within the game. These include archery, swordsmanship, blacksmithing, magic, and alchemy among others. Alchemy, in particular, involves collecting 111 different ingredients (from the wild, purchasing from shops, or occasionally finding them as loot from fallen foes).

  • In Skyrim’s alchemy system, a potion is created using 2–3 ingredients and has all the effects that are shared between any 2 of the ingredients.
  • Each ingredient has 4 effects.
  • The effects are all unknown unless revealed by creating potions with those effects or eating the ingredient can reveal effects 1–4 depending on the level of your Experimenter perk (requires level 90 in Alchemy before you can fully unlock this perk).

A player on his first playthrough will begin by randomly testing hundreds of ingredient combinations that will often fail to produce viable potions. By the end of their playthrough, most will eventually go to a wiki site and look up the effects of unknown ingredients. Others will invest some points into the Experimenter perk and eat any ingredient they see with unknown effects.

On subsequent playthroughs, many players will open up the console and cheat in the Experimenter perk early or go to online forums for an optimal route to unlocking effects. This forum post is the most well done of these posts that I have seen, but unfortunately, it only contains ingredients from the base game (none from the additions to the game). For reference, the recommendations by the players on this board consist of 74 potions to unlock the effects for the 90 ingredients in the base game. The slower of the two algorithms mentioned in this article can unlock all these effects using only 67 potions. Unfortunately, the faster of the algorithms uses 82 potions due to it currently leaving isolated pockets of unknown effects (This will be discussed in the Testing section of the article).

When effects are known, you can easily filter by which effects you want and choose ingredients accordingly within Skyrim’s own crafting system (at the alchemy table). For this reason, revealing the effects the first time will subsequently provide a great deal of convenience for players.

Since I am not willing to level up to 90 in Alchemy before unlocking these ingredient effects, we will attempt to unlock effects by creating potions using the fewest number of ingredients.

The Problem

We will be setting up this data in a directed graph where each ingredient connects to its associated effects.

Libraries

I will assume you have some familiarity with R and the libraries I will be using. I will provide explanations/comments that should provide sufficient context if you aren’t familiar with any particular library.

The following libraries will be used (links lead to cran for each)

  1. stringr: present in your base install of R for string manipulation
  2. rvest: for web scraping
  3. igraph: provides a graph data structure and associated functions
  4. dplyr and tidyr: for data frame manipulations

The Data

We will be scraping the data from a fan-based wiki using the following code:

This data frame should look like the following:

We now must decide how to represent this data to most efficiently use it for this optimization problem. I have chosen a fairly simple graph representation where the nodes consist of the ingredients and effects and the edges represent connections from ingredients to their effects. (One important point to note, I am adding in a property count which we will discuss later)

The following code will set up this data in an igraph object for use in our functions:

Notice that this creates a graph where every node has the following properties:

  • Name: Accessed via V(alchemyGraph)$name Note that igraph ignores the first column and references it via the lowercase name
  • Type: indicates whether the node represents an ingredient or an effect. Referenced via V(alchemyGraph)$Type
  • Count: A placeholder to indicate the user’s current inventory of the ingredient (always NA for an effect). V(alchemyGraph)$Count
  • Comments: Comments from the scraped data. We won’t be using this field in this exercise

Also, every edge contains a boolean property Known, which represents whether or not that ingredient-effect relationship is unlocked by the player. Accessed via E(alchemyGraph)$Known

The resulting graph contains 111 ingredients, 55 effects, and 444 edges. For a sense of scale, it can be visualized using the following:

plot(as.undirected(alchemyGraph), 
     vertex.label = NA, 
     vertex.size = 4)
Light blue nodes are effects and the red nodes are ingredients
Light blue nodes are effects and the red nodes are ingredients

Functions

Below we will define several utility functions. Error handling is included for some. My apologies for the length of this section and congrats if you can follow them all. Feel free to use the table of contents if you would like to skip this part.

First, we will define a few functions that will get ingredients and their effects back out of the graph.

Next, we will define functions to set/get the ingredient count as well as set the Known property of ingredient-effect edges.

Next will be functions for calculating the effects present in a potion given the ingredients used and another to calculate the number of ingredient-effects unlocked if the supplied ingredients were used. We will also add a function to simulate creating a potion using a given graph as well as one to evaluate the number of ingredients a series of potions reveals.

Now a couple of functions that are a little more abstract and will be used to get all possible combinations of a given set of ingredients for use in the main function. (The expand grid function is a modification of one found on stack overflow. The original function is linked in the comments)

Approach

I went through 5 alternative versions of the algorithm to attempt to gain processing efficiency without sacrificing results. One of which pre-loaded the graph with all potential potion combinations and connected those potions to a single root node for searching (root > potion > ingredients > effects). The following two algorithms keep the same graph structure defined above and show minor tradeoffs in terms of speed and efficiency. In a Gremlin graph database, there are a few traversal strategies that could make this easier than when done in R.

Approach 1

I mentioned I would explain the need for the Count property. The simplest approach would assume that we have all the ingredients and I want to know the fewest potions I should make to unlock all ingredient effects (With this algorithm that answer is 78). With the Count property, we can use subgraphs to only calculate for those ingredients we currently have in our inventory. This makes the solution much more dynamic and useful to the adventurer who wants to be an apothecary but hasn’t found the rarest of ingredients yet.

The function recommendPotionForEffectReveal() will serve as the primary logic for the optimization. This function will take a graph in the form of alchemyGraph above, and return a vector (length 2 or 3) of the ingredients to use to make the next potion. This recommended potion should be the one that will reveal the highest number of unknown ingredient-effect edges with a secondary priority of fewest ingredients. Other factors we could also prioritize are cost/rarity of ingredient (not included in this algorithm) and quantity of ingredients (showing a preference for unlocking more effects for the ingredient we have fewer of)

We will cover this function step by step. Before we begin, let’s look at a simple subgraph consisting of only 4 ingredients

plot(sampleGraph, 
     layout = coords,
     vertex.label.color = V(sampleGraph)$color,
     vertex.size = 0)
Ingredients in Red
Ingredients in Red

Next, we’ll showcase known/unknown edges by using a dotted line for unknown ingredient effects. Also, you’ll note that this graph has effects with only one edge. This would not happen in the original graph. For now, we will mark those edges as known to ignore them.

Dotted Lines indicate unknown ingredient-effect relationships (from the player perspective)
Dotted Lines indicate unknown ingredient-effect relationships (from the player perspective)

In the first phase of the algorithm we will create the following subgraphs:

  • sg: A subgraph showing ingredients with a count > 0 and effects that connect to more than one present ingredient.
  • sgUnknown: A subgraph of sg where we delete all Known edges.

Left: sg | Right: sgUnknown
Left: sg | Right: sgUnknown

Next, we will determine the ingredient to start from by finding the ingredient with the highest degree in sgUnknown (Ingredient with most unknown effects)

In this case, we will be starting from "Bear Claws". Which has 4 unknown effects (tied with "Hanging Moss") as seen in the depiction of sgUnknown above. Next, we will perform a breadth-first search (bfs)from Bear Claws and get all neighboring ingredients (Those that are 2 steps away in bfs).

For this version of the algorithm, we need to consider any potentially relevant potion combinations. This will be broken up into 3 parts:

  1. All combinations of the ingredient and those neighboring ingredients. (2 ingredients total)
  2. The combinations using the first ingredient and then all potential combinations of the neighboring vertices (if there is more than 1 neighboring vertex)
  3. The final consideration is for cases where the first ingredient is paired with one of the vertices2Away. Then we look at effects that are neighbors of the second ingredient but not the first and if those effects have any unknown edges and more than one edge overall (To ensure the effect can be revealed by using two ingredients). Then we consider all the pairings of those ingredients neighboring the effects.

Unfortunately, this process in its current form can produce duplicates, so those will need to be cleaned up before checking the resulting potions.

This results in 6 combinations of the ingredients representing 6 different potions. In this simple subgraph, we are using all 4 ingredients, but, in the full graph, this can greatly reduce the number of combinations that need to be checked.

At this point, if the length of potionCombinations was 0, we would exit the function (returning NA).

Next, we will get some metadata for each of the potion combinations. We will check the following:

  1. Number of revealed effects (effect edges that were previously unknown, but would become known if the potion were made).
  2. Number of ingredients in the potion that have a count of only one in the player inventory.
  3. Number of total ingredients used.

Using this information we will prioritize revealing the highest number of effects, showing preference to potions that use 2 ingredients rather than 3, and using potions that use the most ingredients that only have 1 in the inventory (Rationale for this is to assume that the potion using that ingredient is revealing more effects now than if it were used after further iterations).

And based on this we will be making a potion with Bear Claws, Eye of Sabre Cat, and Hanging Moss.

Left: Original SampleGraph | Right: After one iteration of the algorithm
Left: Original SampleGraph | Right: After one iteration of the algorithm

Putting this all together we have the following function:

We will now call this in a loop until no more potions are returned.

Approach 2

The second approach is much more efficient (in terms of processing time) in a graph with many unknown edges. This approach uses similarity scores of the ingredients (scoring based on the number of shared neighbors vs the number of total edges). Using a rare spark of inspiration, we will call this function recommendPotionForEffectReveal2.

The second approach starts out much the same. We do add an additional subgraph (sgKnown) which shows only the known edges. We also calculate the similarity scores of all the Ingredient vertices left in sg.

If there are any similarity scores greater than 0, we can use this matrix to identify the first two ingredients of our potion.

We then look at any other ingredients that share the highest similarity with ingredient 1 or 2. In this function, if there are any similarity scores greater than 0 among the remaining ingredients, we will select whichever is highest and any ties will go to matching with ingredient 1.

The more complex part of this function comes in the else block. This block is checking for any unknown edges between any of these ingredient pairs where one ingredient knows the effect, but the other does not (These would not show up in this similarity matrix as these were based on sgUnknown).

If the similarityMatrix did not have any values above 0, it would imply that no two ingredients shared the same unknown effect (One might know the effect, but we won’t see it in sgUnknown). When this case occurs, we will simply call the original function from approach 1. (To test the speed of this algorithm and exit early when this condition is met, you can simply return NA here. When testing on a full graph with all unknown edges the return time was ~20 seconds and only 24 edges remained unknown.) The final function and the looping function are below.

Testing

We will set up two versions of the graph for testing.

  • AlchemyGraph_Full: Counts set to 50, edges unknown
  • AlchemyGraph_Partial: Counts set to random number 0–5 for 3/4 of the ingredients, 1/2 of edges set to unknown

    Now we will perform 4 runs, one for each graph and algorithm.

    You can see that in each case, the second function is much faster. Regarding potion/ingredient efficiency:

  • In alchemyGraph_Full, the first algorithm was able to reveal all effects in 78 potions while the second used 90 potions.
  • In alchemyGraph_Partial (Your values will be somewhat different than mine since we randomly assigned the properties), my run of the first algorithm revealed 157 effects using 41 potions, and the second algorithm used 48 potions, but only revealed 153 effects.

Based on the above, the second algorithm performs fairly well for the increase in speed. The inefficiencies found in function 2 could be a result of it leaving small outliers when optimizing for the highest number of effect reveals. Potentially, tweaking the portion of the algorithm that determines the third ingredient (if one is used) could cause this function to leave fewer unknown edges stranded.

As a visual aid, below are some gifs showing the graph as each algorithm processes it. For reference:

  • Dashed line: Unknown relationship
  • Solid line: Known relationship
  • Blue node: Effect
  • Green node: Ingredient with all effects known
  • Yellow node: Ingredient with some effects known
  • Red node: Ingredient with no effects known
  • Magenta node: Ingredient currently being used in a potion

alchemyGraph_Full using Approach 1: 444/444 effects revealed, 78 potions, most revealed with a single potion is 12. (run1)

alchemyGraph_Full using Approach 2: 444/444 effects revealed, 90 potions, most revealed with a single potion is 12. (run3)

alchemyGraph_Partial using Approach 1: 157/222 effects revealed, 41 potions, most revealed with a single potion is 8. (run2)

alchemyGraph_Partial using Approach 2: 153/222 effects revealed, 48 potions, most revealed with a single potion is 8. (run4)

These algorithms don’t always find the single most optimal mix of ingredients, but they do a decent job without sacrificing too much in terms of time and they do so while dynamically keeping the player’s own inventory and known/unknown effects in mind.

I’m hoping you enjoyed this guided example and it has given some of you inspiration to apply what you’ve been learning to unconventional problems. Any feedback is always appreciated.


Related Articles