
Introduction:
After making a MATLAB program that can play Connect 4 at a decent skill level, I decided that the next logical step was to design a program that can learn how to play Tic-tac-toe. After all, it’s just a simplified version of Connect 4, so it should take less time for the machine to learn and increase it’s winning rate.
Tic-Tac-Toe Machine V1.0
It starts with 2 completely random Tic-tac-toe move generators playing each other. After a game is finished, one of the machines takes all of the decision data that it collected from the game, and puts it in it’s database. If that move resulted in a win, that move gains 3 points. If the move resulted in a loss, it loses 1 point. A tie game results in plus 1 point. Over a period of 100,000 games, each unique scenario is documented in it’s database, along with the moves it chose after that. Moves with higher scores for that scenario are more likely to be picked in the next game, while moves with lower scores are more unlikely.
The idea is that over the course of several thousand games, moves that resulted in wins or ties gain more points, and are more likely to be picked. What resulted was a database of over 2,000 unique game scenarios, with different weights for each move. Below is a plot of the total amount of unique scenarios in the database over the course of the 100,000 games.

As you can see, the amount converges on the value 2,423, and at that point, the machine starts seeing reoccurring scenarios and refines the weights for each of the 9 possible moves.
But how did TTT-M1’s winning percentage look over the 100,000 games? Below on the left is the overall winning percentage for all simulations, and the right plot is a zoomed-in portion of the first few thousand games.


Since both TTT-M1 and it’s random move opponent both start with zero knowledge of the game, the win percentage fluctuates greatly in the first 2,000 games. As some point, it begins to settle and form a smooth line that slowly increases over time. This is because the machine has now seen all the possible scenarios, and is now refining the weights for each move.
It was at this point that I thought I had a completed project. There was no way anyone would be able to beat a machine with 100,000 games of Tic-tac-toe experience. Well, I thought wrong.
It turns out that while 100,000 games is a lot, it really means nothing when it is playing against a random move generator. A real person has intuition, which is something my machine’s opponent did not. As such, it was quite easy to beat this machine, or at least tie it. This does not mean that the machine hadn’t learned some techniques when it came to blocking it’s opponent. It was definitely making smart decisions, it just did not account for an human opponent with decent logic.
Tic-Tac-Toe Machine V2.0
The solution to this was rather simple. Have another machine start with no logic of the game whatsoever, and have it face my Tic-tac-toe machine version 1.0 (TTT-M1). Using the same code as before, this new machine (TTT-M2) would learn over 150,000 games how to play Tic-tac-toe, but it would learn against an opponent with a decent amount of skill. Theoretically, TTT-M2 should be smarter than TTT-M1, since it has thousands of games to expose it’s weaknesses, while still getting to develop new strategies.
One particular situation I was interested to see play out was the first move decisions. TTT-M1 got the first move, and I expected it to converge on the middle spot as being the best. It turns out, it actually liked the corners more for it’s starting spot. This left the middle open for the second player, leading to TTT-M1 losing a lot of it’s games against real people. I wanted to see if TTT-M2 would pick up on this weakness, and start targeting the middle spot if it was open for it’s turn.
Here is the learning rate for TTT-M2 against TTT-M1:

As you can see, the win percentage started out very low since it was facing an opponent with some knowledge of the game. As the simulations went on, it learned the weaknesses of TTT-M1 until it converged to a winning rate around 0.50. This is very good considering it did not have the first move.
It had also learned that if TTT-M1 did not take the middle on the first move, then it should take it. This proves the well known fact that the middle spot is the best spot to have in Tic-tac-toe.
Now that I had a decent opponent, I developed TTT-M3.
Tic-Tac-Toe Machine V3.0
The final evolution of my Tic-tac-toe machine learning project is TTT-M3. This machine faced off against TTT-M2 in 150,000 games, learning the best strategies while also finding the current holes in the best database. After 150,000 games, here are the results:

This shows that the final generation of this Tic-tac-toe machine learning project started out very weak. It hovered around a 25% win rate in the first 1000 games, but then took off and reached around 74%. This is against an opponent with some Tic-tac-toe logic as well (TTT-M2). Here is an example game I played against TTT-M3 ("2" is TTT-M3’s move, and a "1" is my move):

I noticed right away that it had developed a strategy to lure me into picking the center spot. Once I did that, it was a tie game every time, which in the computer’s mind, is a good result.

I played the middle spot, falling for the trap. It then sets me up to take the top left corner.

After blocking it’s horizontal attempt up top, it then sets me up again on the rightmost column.

It’s next move in the second row and first column sets up the tie, securing the fate of this match.

Conclusions:
After only 3 generations of evolution for this program, it had developed a strategy to set up ties, block the opponent, and go for the win. Overall, this project was a success in that a program was created that can learn new skills and get better at Tic-tac-toe when playing against other opponents.
If this type of content interests you, I have an assortment of articles based on board games and Data Science: