Deep Learning vs Puzzle Games

Is deep learning better suited to solving Flow Free than good old brute force techniques?

Kabalan Gaspard
Towards Data Science

--

We’ve all been there. You’re bored on your phone and have some time to kill, so you decide — against your better judgement — to visit the games section of the app store to see what’s trending. You see a puzzle app that looks fun, but it doesn’t really matter, because you’re only going to play this for half an hour then delete it and forget about it, right?

2 months later, I had completed over 2,000 levels of Flow Free, insisting on getting a “perfect” rating on every level. The premise of the game — one of the most popular mobile games on both iOS and Android since its 2012 release — is stupidly simple: connect “valves” of different colours on a 2D grid, without 2 lines ever crossing:

Flow Free game screenshot
Flow Free — you can understand the game pretty much from one screenshot

The level in the screenshot may seem simple, but it does get harder. As the levels progressed, I found myself coming up with some tactics that would help me solve these advanced levels faster (e.g. sticking to the outermost unfilled boundary whenever possible, avoiding creating “inlets” of unfilled squares, etc.) Browsing online forums, I saw that other players had their own techniques, some similar and some different to mine. This begged the question — could a computer, not through brute force but through “experience”, learn these techniques as well?

A human (yours truly) solving a Flow Free puzzle

Start with the basics: A*

Deep learning problems nowadays mostly reduce to deciding which algorithm to use. I started off with A* search. Even if it isn’t deep learning per se, it gives a good idea of the inherent complexity of the problem, and gives us a chance to try out a few heuristics a more advanced algorithm could figure out on its own.

The space complexity would be way too large to solve the whole board at once, so I started by solving recursively colour by colour (with a backtrack to a previous colour if a given path was “blocked”). As a heuristic, I used the trusty Manhattan distance. I tested my algorithm on 4 sizes of boards: tiny (2x4), small (5x5), medium (8x8), and large (14x14). I made sure the medium and large boards had fewer colours than average, as this actually makes the puzzle more difficult, both for humans and computers (more possible paths/options/states).

This worked fine on the small board, but took a fair bit of time. I therefore added a few rules to the next state function, in the hope that reinforcement learning algorithms would figure out those rules themselves: prevent non-consecutive adjacent squares of the same colour, or empty “inlets”:

Examples of what’s not permitted with an A* search Flow Free solver

The results on my 7-year-old PC were much more encouraging, but still needed improvement:

Flow Free AI solver A* search results
I’m ashamed to say I probably spent more time on my Tkinter graphing functions than the actual AI

If you think you’re the first to do it, you’re probably wrong

I was playing around with optimising my A* search a bit more before moving on to reinforcement learning, when I discovered this excellent blog post by Matt Zucker, where he had already built an A* solver for Flow Free (glad to see I wasn’t the only one with this obsession), and had thought much more carefully about the states to eliminate from his A* search. Even more, he had found that a simple SAT algorithm with just 6 rules outperformed his very advanced A* search, and was getting very good solving times with SAT (sub-second even for 14x14 boards).

It seemed, so far, we had both arrived to the same discouraging conclusion: a simple brute force technique will outperform basic AI algorithms for this type of (non-adversarial, closed-ended) puzzle game.

Still, there was no use stopping there. After all, the question that launched this exploration still remained: as human players, we discovered specific techniques to beat Flow Free puzzles more efficiently after playing a few levels. Why wouldn’t a machine be able to learn the same techniques?

Cue reinforcement learning

I was very excited to try out Q-learning on Flow Free, as that was really where the “I” in “AI” would start to come into play. The work on the A* search was anything but a waste of time, as we can then use it as the state-action space for our Q learning agent. We take the state space to be a combination of which squares on the board are occupied by which colour, and which path (colour) is currently “active”. An action is simply the next square to fill in that path.

The learning agent learned quite quickly how to solve the small board (100 iterations in 1.5s) after making some basic errors in the beginning — looking good so far. On the medium board however, still no dice after 10,000 iterations, which took 10 minutes:

Flow Free solver Q-learning attempt
Not exactly what you think of when you hear “Artificial Intelligence”

To improve this, I started playing around with the standard Q-learning parameters (learning rate α, discount rate γ, exploration/exploitation rate ε) which didn’t help much, so I turned my attention to the reward function. Aside from playing with the actual reward, there was essentially one parameter to toggle with the reward function (or risk becoming too prescriptive, which would go against the whole machine learning exercise): whether or not the agent gets a reward for solving a single path rather than the whole puzzle. Unfortunately, this didn’t make much of a difference.

In the end, it was becoming clear that the algorithm was struggling on larger boards simply because the option space was too large. Q-learning does indeed help in this situation vs A* search by making more clever choices, but there were still too many situations that, even after 10,000 iterations, the exact Q-learning agent hadn’t seen yet. The next natural step was therefore:

Approximate Q-learning

The applications of approximate Q-learning are fascinating, particularly in games. Rather than the agent deciding a best action in a given state, the idea is to give it some intuitive features that it can quickly calculate at each step, independently of the exact state (configuration of the board), and let it decide which ones are the most important. This can be a game-changer in games like Pacman (e.g. decide your next move based on the distance to the nearest pellet and to the nearest ghost, rather than an action for every single possible state), or of course Flow Free, where the number of possible states is simply too large for exact Q-learning to be effective.

The trade-off is that it’s now on the developer to pick the “right” features. I narrowed the list down to features I knew were important in many cases (e.g. total remaining Manhattan distance to close each path), and some I suspected were important (but had no way to prove), which I would just let the algorithm figure out. These include:

  • Total remaining Manhattan distance to close each path
  • Number of “turns”
  • Number of “boxes”
  • Number of valves in boxes
  • Number of boxes with no valves (one can prove logically that this should never happen — I wanted to see whether the algorithm could figure it out)
  • Whether a path is “hugging a wall” (i.e. if a path can stick to another adjacent path)
Illustration of approximate Q-learning features for Flow Free solver

Unfortunately, with these features, the Q-learning agent wasn’t even able to solve the small board, even after varying the Q-learning parameters. However, it was an interesting exercise to see it in action. For instance, the algorithm attached a strong negative weight to “boxes with no valves”, meaning it was able to learn that having a box with no valves leads to the puzzle not being solved.

The approximate Q-learning detecting better playing strategies
The approximate Q-learning detecting better playing strategies

Perhaps with a larger sample size of puzzles, it could better learn to actually solve them, but I was excited to see it actually picking up what was important. This is a fascinating phenomenon in AI which is quite common in game AI in particular: even if an algorithm can’t win the game, it can find techniques to help a human play better.

Going supervised: Convolutional Neural Networks

I was, initially, biased against the idea of a supervised approach to Flow Free. First of all, it would require a large sample of solved Flow Free games, which are difficult to find on the public internet. Second, the supervised approach that seemed to most closely match Flow Free — neural networks — is a notorious black box, which would preclude the most interesting part of the exercise: seeing the techniques the algorithm learns to solve the puzzle.

I then however came across an article by Shiva Verma in Towards Data Science where he does something fairly similar with Sudoku: essentially treats a Sudoku board as an image and uses a convolutional neural network (CNN) to solve it. The author reached good results with Sudoku, which caused me to revisit my initial reservations and try this approach for Flow Free nonetheless.

The first hurdle was, of course, getting the input data; solved Flow Free puzzles in parseable text format are a lot harder to find than Sudoku puzzles. The best way I found in the beginning was to look into the code of the Android app, which had just over a thousand puzzles stored in text format:

Example of parsing a text solution extracted from the Flow Free Android app into an image

The initial results of the CNN were disappointing: a flat 0% of puzzles from the test sample solved, although the CNN was learning that it should make paths, rather than just fill out colours in isolation.

We need more data

Tweaking layers, epochs, kernel size, and the other such usual suspects didn’t help much. It was looking like we were back to a data scientist’s most common problem: the best algorithm in the world is nothing without enough data. The best other source of data I found was www.flowfreesolutions.com, with thousands of Flow Free solutions to entirely new puzzles than the ones I had…but in image format.

Despite my numerous attempts to contact them through various channels (and even with a financial incentive offered), I was not able to get them to send me a parseable text version of their solutions. Well, this isn’t an AI article for nothing — who needs the underlying solution matrix when one has modern image processing? Cue a sub-project to build a Flow Free solution image processor:

ML image parser for Flow Free solutions
Scikit-image FTW

Using symmetry to multiply our available data

This yielded a couple of thousand more data points to work with, but that still really wasn’t much. But I then realised that, as far as the CNN is concerned, the exact value of the colour doesn’t matter, just the fact that the colours are different. So we can permute the colours around and still have another non-redundant data point. Even for 5x5 boards which have up to 6 different colours, this gives our CNN as many as 6!=720 times more data points to work with (and of course even more combinations to choose from for larger board with more colours):

We can permute colours to create an extra data point for a Flow Free CNN solver
Mathematicians will recognise the symmetric group S_n from Group Theory

A friend pointed out that this is in fact a common way to increase data points in game AI. With 720x as many data points, we were finally getting somewhere — 12% accuracy on test data with a 20-epoch model with ~200k data points, that took 20 minutes to run on my 7-year old GPU. Note that we are using a strict criteria for success here: even if the board is off by one cell, we count this as a failure.

However, the failures were much more interesting than the successes here. Out of almost all of the failures, the CNN solved most of the board correctly, enough so that it would be easy for a human to complete it. In this sense, the CNN was able to solve the original premise of the article: to intuitively learn human techniques:

Even when the CNN Flow Free solver gave an incorrect result, it was still very close and “human-like”
Distribution of errors for the CNN of the AI Flow Free solver
Distribution of errors of the CNN

Conclusion

  • Simpler methods typically remain better for solving grid-based, non-adversarial puzzle games. In fact, as the method gradually got more advanced, the performance got worse.
  • However, although more advanced ML methods don’t solve the puzzle as quickly, they do find interesting insights and help humans get to the better solution — a convolutional neural network did this the best. Moreover, their performance scales better than more traditional solutions.
  • Better data beats better algorithms.

Going further: reader and editor suggestions

I asked some more experienced AI/ML experts (a big thank you to Matt Zucker, Youssef Keyrouz, and Mark Saroufim) to review this article and they suggested trying the following ideas to improve the CNN algorithm. This may be the subject of a Part 2 article, or you can feel free to try them yourselves (as well as the approaches detailed in this article) on https://github.com/kgaspard/flow-free-ai:

  • Changing the number of layers in the CNN (ablation did not seem to help)
  • In addition to using symmetry to multiply our data points, also using rotations/reflections
  • Using the CNN predictions as features for the reinforcement learning agent

--

--