The world’s leading publication for data science, AI, and ML professionals.

Breaking the Game: Pendragon Four Rise of Merlin

Multi-agent reinforcement learning for the mobile phone game Fate Grand Order adding in powerful supports to clear late-game content

Image here
Image here

In my first Pendragon Four blog I introduced my multi-agent Reinforcement Learning (RL) setup for the mobile phone game Fate Grand Order (FGO) I listed a few goals:

1) Add supports to the game

FGO does not have a player vs player aspect and basically all the characters are viable for use, but it still has a reasonably defined meta. In gaming, the meta roughly describes strategies, characters, or weapons that are more dominant than others. In FGO the meta is defined largely by the support characters that are available which enable powerful/dominant strategies. For the North American server of FGO the first real game breaking and meta defining support was Merlin who was released at the end of 2018.

Merlin the Mage of Flowers is FGO’s version of the magician/wizard/druid most people are familiar with as the king maker in the Legend of King Arthur. I say that Merlin is a meta defining character in FGO because he provides healing to help teams sustain enemy damage, provides invincibility, charges team’s ultimate abilities, and if all that isn’t enough he can greatly amplify the damage output of the team.

So if the title and the past few paragraphs were not enough foreshadowing I decided to add Merlin in as a character to my game environment to train an agent to use Merlin’s abilities.

2) Clear end game content using some sort of multi agent framework

I wasn’t planning on formally trying to clear any sort of late game content for the time being, but adding Merlin into the mix made it so I almost had to in order to provide an adequate challenge for the agents. So I selected a relatively high level quest to replicate as an environment and built a bot team to play through it.

3) get some version of this project accepted as a talk somewhere.

Who knows if that will happen but I shall most certainly try.

Merlin Breaks the Game

Merlin’s introduction to FGO basically broke the game in the sense he causes what should have been challenging content to become fairly trivial in difficulty due to him adding in extremely high amounts of healing and damage boosts. Once I coded him up and started to train bot teams with him as a member it turns out Merlin also broke my game environment as well.

Technical Implementation of Merlin

From a technical point of view, Merlin was slightly different from my previous agents who were all damage dealer type characters. The damage dealers that I built agents for previously essentially all had abilities that affect either themselves or the entire team. Since each character has three abilities the action space for each of these characters is 4 including the option to pass. Merlin, however, ends up having an action space of 6. Merlin’s abilities are a team wide buff, team wide invincibility, and a targeted damage boost. The two teamwide skills represent two possible actions, pass is the third, and the targeted damage boost is represented with three possible actions for an action space of 6. The targeted damage boost is the new addition for one of my Pendragon agents and I essentially let that one skill be three possible actions where I let the Merlin agent choose who to target with the ability so that agent can choose to use that skill on any of the three characters on the team.

In addition to these skills, Merlin is also the first character I added whose ultimate ability, noble phantasm (NP), is not damage related. In my custom game environment, when previous characters charged their NPs to 100% I would have them just deal a bunch of damage, but for Merlin, I had to make some modifications. Merlin’s NP is called "Garden of Avalon" and I actually leveraged the mechanics I added to track skills and their effects over their duration. So whenever Merlin charges his NP to 100% in the game environment his NP gets applied like a skill to himself and the rest of the agents

Immediate Ramifications

Once I got Merlin coded up as a character and the game environment was able to handle him I found that I had an interesting problem. My agent teams were winning upwards of 90% of the time in my most difficult game environments as soon as they were initialized. Deceptively this is a bad thing. What I found is that the agents trained in this setup basically did nothing. They would not use skills, and they would never actually do anything interesting… In comparison the best my previous bots had ever done was a win rate of between 85–90% once they were fully trained.

The difference was Merlin.

original image here
original image here

Essentially what was happening was that Merlin as a character provided so much utility that the bots could basically do whatever they want and still win. His NP Garden of Avalon allowed the bots to effectively heal through almost all the damage the team received and Merlin’s damage buffs allowed agent team to win regardless of what they did individually. In that version of my game environment, the bot teams were rewarded based on winning, a +1 for winning and a -1 losing. However, in a situation where all the team does is win, the actions that get reinforced are the ones that are the most common.

Each character has three abilities which they are allowed to use once per quest. A normal quest lasts between 7–15 turns so for most of it they have to pass. So with this setup, that means that the bots learned to pass and therefore do nothing.

Learning with Merlin

It took me a few days to figure out how to deal with this, because technically the bots were winning and were doing well, but they were just not learning what I felt were useful behavior. Part of me wanted to just continue to make the game environment more and more difficult, but at a certain point the environment would stop looking like a reasonable FGO level. So the training environment would no longer really represent the actual game.

My breakthrough here came about when I was rereading Andrej Karpathy’s blog about teaching a RL agent to play Pong.

Andrej Karpathy's blog is a great read!
Andrej Karpathy’s blog is a great read!

If you haven’t had a chance to and are interested in RL it is worth reading! Anyway, something that I noticed on this rereading was that Andrej notes that something that we should be doing is:

encouraging and discouraging roughly half of the performed actions

This got me thinking about my situation with Merlin and the difficulties he was causing. In my current implementation, I am not discounting future rewards and am just applying a simple +1 and -1 reward based on winning or losing so some of the methods that are listed in the blog don’t apply. The concept of discouraging roughly half the actions an agent takes does though, the question was how.

The first thing that I did that was effective was setting a turn limit that the bots had to win by or else they would actually receive a small negative reward. For example, with a turn limit of 12 if a bot won in 10 turns it would receive the full +1 reward, if it won in 13 it would receive a -.25 reward, and if it lost it would receive the full -1 penalty. This reward structuring means that the bots would learn that they can’t just win, but have to win quickly.

However, this method imposes a slightly artificial constraint on how quickly the bots should win. To fix this what I ended up doing is making it so the bots would have to win faster than their past selves in order to collect full rewards.

Since I already track the number of games won in a 1000 game period as part of my training metrics, I started logging how long those games were and used the average length of game over one 1000 game period to set the number of turns required to collect the full +1 reward for the bot teams. What this means is that the bots are always trying to win faster than their previous selves.

Using this method of rewards Pendragon Four learns interesting behavior even after the addition of Merlin. So once I was able to have the bots learning again and getting win rates upwards of 97% on my previous game environment I felt that it was time to see if I could train the bots to clear some harder FGO content. FGO quests are rated by difficulty based on the level of team that is required to clear them. Up until now most of my benchmarking has been based around a mid level quest that is common for farming which is a level 40 quest. The highest level of quests in FGO is effectively 90 with a few quests being 90+ which are typically boss fights of some sort. The quest that I selected for this foray into building more difficult content is a level 78 quest called The Execution Site on Gallows Hill in a FGO story arc that occurs during the Salem Witch Trials.

The Execution Site

While I could have selected a different mission for benchmarking the reason I selected the Gallows Hill Salem mission was because it’s the best drop site for an item in the game that I currently need a bunch of called "Stake of Wailing Night".

fgo wiki
fgo wiki

In order to level up the stats for a character you typically need to gather up specific materials. Stakes of Wailing Night are required for a character I am working on and they require 216 stakes for full leveling and it takes an average of 1.5 runs of this quest for a stake to drop. Yes… this game has a lot of farming in it.

Building the Level

The structure of the level follows a pretty standard format for later game FGO content. There are three levels where the difficulty increases as you go from wave one through three.

The first wave is pretty easy, the total health pool of the three enemies is 38,000. Which can be dealt with by a high level team fairly quickly.

Wave 1 Enemies
Wave 1 Enemies

Wave two scales up the difficulty by adding beefier enemies. As a quick benchmark the enemies from this wave have 70,813 health, and one enemy has as much HP as the entire first wave.

Wave 2 Enemies
Wave 2 Enemies

Now onto the final wave, wave three. For this mission wave 3 has 290,058 HP. so more than double the two previous waves combined. This one can get tricky if you get stuck on it for a prolonged period of time since the main enemy with 192K HP gets to launch a powerful attack every 4 turns. While benchmarking my Pendragon Four agents the randomly initialized teams would typically get to this third wave and then get stuck trying to whittle down the enemy HP and die.

Wave three Enemies
Wave three Enemies

So in building out this level, there were two things that I wanted to do. First, scale the level appropriately so the agent teams would have to get through a realistic amount of health. Second, in order to up the difficulty, I wanted the agents to have to deal with their enemies using more powerful attacks once they charged up.

For the first point, I added more health to the enemies at every wave in comparison to my previous hard difficulty levels. Wave one has 30 health, wave two has 55 health, and wave three has 140 health for a total of 225 total health up from the previous highest I had trained on at 180.

For the second point, I added mechanics to let enemies deal more damage every X number of turns. For example this final wave deals its normal damage + 10 additional damage every 4 turns. The agents as a team have 30 total health so a prolonged fight on the third wave can pretty easily defeat the agent team.

Once the new mechanics and specifications were in place it was time to train and benchmark Pendragon Four.

Training and Results

After my other various pipeline improvements I was able to get the environment coded up and decided on the team that I would have the agents play as.

The team would be a two damage dealer composition with Merlin as the support character. The two damage dealers would be Ishtar who I had previously coded up as a usable character and a new damage dealer nicknamed Nero Caster who I wanted to use because she has favorable match-ups for this level.

Ishtar, Merlin, and Nero Caster
Ishtar, Merlin, and Nero Caster

Once the team was in place we were ready to roll!

Roughly following my training protocols for Pendragon Four I let the bots play and monitored their progress. See a sample run below. The training starts with a high exploration and learning rate for the first 20,000 turns which is why there is not improvement in the first section of the graph. Then I start to decrease the rate of exploration and learning rate every 10,000 turns or so until 70,000. As we would hope, the win loss percentage increases over the course of the training run (blue) while the number of turns in a win decreases over the same period of time.

Over 70,000 games the average turns to win (red) starts at 9.75 and ends around 7. The overall win percentage starts at 45% and ends up at 80%.
Over 70,000 games the average turns to win (red) starts at 9.75 and ends around 7. The overall win percentage starts at 45% and ends up at 80%.

These training results were heartening since the bots appeared to learn and it seemed like an adequate challenge for them since the win rates never got extraordinarily high.

So on paper this seems successful, but now that Pendragon Four is playing more difficult content I wanted to benchmark it against playing the levels myself.

Breaking Down Pendragon Four’s Play

In order to benchmark how Pendragon Four is playing I had the Pendragon Four trained agents and myself play 10 rounds of the Execution Site in the actual FGO game where we both played under the same constraints.

  1. Skills can only be used once
  2. Once a character’s Noble Phantasm (NP) is charged to 100% it must be used.
  3. Waves must be fought front to back (no re-targeting)

The results are in the following table:

For benchmarking I included runs from randomly initialized agents. I added the asterisk next to their "mean" because the randomly initialized agents would stumble through the first two waves, get to wave three, and then die because they were stuck there for a prolonged period of time. I found watching this depressing, also it costs resources to revive a team and continue playing and it seemed like a waste on my part to spend in game resources this way, so I stopped after five or so rounds. In comparison neither the trained Pendragon Four agents nor myself actually lost the level.

As a human player I still made use of more game information like which characters the individual command cards that were dealt belong to which I think helped me play faster than the bots. However seeing the bot behavior on paper I had actually thought it would be quite inferior to my human strategies but was surprised that the bot strategies work well and actually are pretty consistent.

One of the things I have been doing as I have improved Pendragon Four is that I wanted to remove myself and my beliefs on how to play the game from the pipeline as much as possible. This has mostly been by switching to pure policy gradient approaches with +1 and -1 rewards and designing the new level without incentivizing certain actions via damage boosts or other mechanics. Looking at the results here I think that that endeavor has been fairly successful.

Opposing Strategies

Whatever strategy gets used, for a quest like this one the only way to really win is to use powerful damage dealing NPs to clear it. My basic strategy is to save my team’s NPs and fire them off only on the third wave to clear them as fast as possible. While Pendragon Four uses a very different strategy and actually fires off its team’s NPs twice per game.

My Approach

As a player I tend to play FGO by saving skills and NPs for the final wave of enemies because they are usually fairly powerful and saving firepower for that final difficult confrontation makes sense. So what this looks like from a gameplay point of view is that I will typically use either no skills on waves one and two or just use a few attack buffs to get through those waves more quickly. Then on wave three, I use my team’s utility skills to charge NPs up to 100% and win relatively quickly. So over the course of the level I rely on using each of the damage dealer’s NPs once.

Pendragon Four’s Approach

However, what Pendragon Four does is almost the exact opposite. Pendragon Four uses its utility skills to charge Ishtar and Nero’s NPs and immediately clear the first two waves while saving most of its combat skill buffs for the third wave. Once it gets to the third wave it starts to play its attack buffs and Merlin’s support abilities to increases its turn by turn damage, healing, and ability to recharge its NPs to 100% and win through using Nero and Ishtar’s NPs a second time over the course of the level.

Breaking Down Pendragon Four’s Approach

On paper I thought this wouldn’t fare well in the actual FGO level because I thought that the agent team would get stuck on the third wave and die. However, I was pleasantly surprised and feel that I have a few things to learn from the agents.

Pendragon Four Turn 1 Ishtar charges her NP and clears the wave
Pendragon Four Turn 1 Ishtar charges her NP and clears the wave

Early Game: Waves One and Two

Looking at the gameplay it seems like Pendragon Four uses its NPs to instantly clear the first two waves because it lowers the possible variations and very consistently lets the bots clear those initial waves. One danger it might have been avoiding is that there were several runs where I played in my normal way but accidentally had NPs get fully charged, this meant that I was forced to use those NPs at bad times and went into wave three at a disadvantage. Those runs took either 9 turns or more to complete which meant I was playing slower than the average Pendragon Four time. So Pendragon Four’s decisions to use those NPs and utility skills immediately negates this potential danger.

Pendragon Four Turn 2: Nero Caster chargers her NP and uses it to clear the wave
Pendragon Four Turn 2: Nero Caster chargers her NP and uses it to clear the wave

Going into wave three the agents have relatively little charge on their NPs so as a human I would be concerned because the lower the charge on the team’s NPs are the longer they would have to spend on the dangerous third wave before they got their NPs back up. This was not the case and its because Pendragon Four was able to use Merlin effectively.

Late Game: Wave Three

Once the game gets to the third wave Pendragon Four uses Merlin’s NP Garden of Avalon which gives the party 5 turns of healing and boosts their NP by 5% every turn. This plus his team wide attack buff which also boosts NP charge by 20% every turn means he can give the entire team 45% NP charge over the next period of time. This combined with the sustain and invincibility he offers gives the team enough sustain to last through any potential damage.

Turn 3 Attack buffs and Merlin's NP "Garden of Avalon".
Turn 3 Attack buffs and Merlin’s NP "Garden of Avalon".

I mentioned before that Merlin is overpowered, and one of the things that I was hoping for was that Pendragon Four would learn to use Merlin in interesting ways. Something that I was hoping for but not guaranteed to see was that Pendragon Four learned to block the powerful attacks from the third wave using a team-wide invincibility skill that Merlin provides. I mentioned that the third wave uses powerful charged up attacks every 4 turns. Pendragon Four learned that pattern in training and successfully blocks those charged attacks using Merlin’s teamwide invincibility (see below). Using that skill at the correct time combined with the healing that Merlin provides with his own NP means that the bots can fight fairly safely fight on wave three until they are usually able to recharge their NPs and win.

Merlin agent uses invincibility to negate charged enemy damage.
Merlin agent uses invincibility to negate charged enemy damage.

Closing Thoughts

Breaking down the strategies that Pendragon Four is learning for this post has been an interesting experience since I haven’t really looked at the strategic aspects of play my agents have developed. As I mentioned I am particularly pleased that they developed a different playstyle than I use to play FGO since it shows some of the fun of reinforcement learning approaches and letting agents freely explore their environments and action spaces.

The tendency to use abilities more readily than a human player is one of the interesting points that I’ve seen in other RL projects such as OpenAI’s OpenAI Five. Something that people commentated about the OpenAI Five agents is that they had less reservations about using long cooldown abilities than a human who tend to play like I do and hold onto abilities looking for a more "optimal" time to use an ability or cooldown. It also might be a function of how agents are incentivized to consume a resource more immediately rather than saving it since if they save it there is no guarantee you will get to use it in the future. This sort of shorter term planning is interesting to look at, but I also wonder what it would take to make agents that are more conservative in their ability usage. The idea of seeing if I can make agents more conservative in nature might be me trying to make the agents more like "me" than they need to be.

While their strategies are different from mine they are still quite effective and I am very pleased with their results. I think it will be interesting to throw some differently structured missions at them. For instance, there are certain quests that have extremely powerful final bosses where I think it would be difficult to clear while burning spells to early in the quest. So it would be an interesting test to see how Pendragon Four can adjust to that level of difficulty.

Going forward I think a lot of my work will be around adding harder content and training agents that can maybe generalize better across different quests. One of the interesting notes I have seen in works by OpenAI is around randomization of levels and agent stats to promote greater exploration of their environment as well as the way their agents are trained as perfect "clones" of one another. This opens a possibility to have a set of three agents which share the same weights but can freely play any set of FGO characters and clear missions.

Ending the game with Ishtar's NP. Nero Caster used hers the previous round.
Ending the game with Ishtar’s NP. Nero Caster used hers the previous round.

Related Articles