Tic-Tac-Toe Learner AI

Ashik Poovanna
Towards Data Science
3 min readAug 3, 2019

--

Tic-Tac-Toe is a simple game for two players that we enjoyed playing as kids (especially in boring classrooms). The game involves 2 players placing their respective symbols in a 3x3 grid. The player who manages to place three of their symbols in horizontal/vertical/diagonal row wins the game. If either player fails to do so the game ends in a draw. If both the people always play their optimal strategies the game always ends in a draw.

Ex: Tic-Tac-Toe Board

Since the grid is small & there are only two players involved the number of possible moves for every board state is limited thus allowing Tree-based search algorithms like Alpha-Beta pruning to provide a computationally feasible & exact solution to building a Computer-based Tic-Tac-Toe player.

In this article, we look at an approximate (Learning-based) approach to the same game. Even though a better algorithm exists (i.e. Alpha-beta pruning), the approximate method provides an alternative method that might be useful if the complexity of the board were to increase. Also, the code changes to incorporate that would be minimal.

The idea is to pose the Tic-Tac-Toe game as a well-posed learning problem as mentioned in Tom Mitchell’s Machine Learning Book (Chapter -1). The learning system is described in brief in the following section.

Tic-Tac-Toe Learning System

The basic idea behind the learning system is that the system should be able to improve its performance (P) w.r.t to a set of tasks (T) by learning from training experience (E). The training experience (E) can be a direct (predefined set of data with individual labels) or indirect feedback (No labels for each training example). In our case:-

  • Task (T): Playing Tic-Tac-Toe
  • Performance (P): Percentage of Games won against Humans
  • Experience (E): Indirect feedback via solution trace (Game History) generated from games played against itself (a clone)

Learning from Experience(E): Ideally, a function (Ideal Target Function) needs to be learned that gives the best move possible for any given board state. In our problem, we represent our ITF to be a Linear function (V) that maps a given board state to real value (score). Then we use an approximation algorithm (Least Mean Squares) to estimate the ITF from the solutions traces.

  1. V(boardState) → R, (R-Score value for a given boardState.)
  2. V_hat(boardState) ←(W.T) * X , (W- Weights of the Target function, X- Features extracted from given boardState.)
  3. LMS training rule to update weights: Wi ←Wi + lr * (V_hat(boardState)-V_hat(Successor(boardState))) * X, (i- ith training example, lr- Learning rate)

The score (R) for each non-final board state is assigned with the estimated score of the successor board state. The final board state is assigned a score based on the end result of the game.

3. V(boardState) ←V_hat(Successor(boardState))
4. V(finalBoardState) ←100 (Win) | 0 (Draw) | -100 (Loss)

Implementation

The Final Design is split into four modules (Ch-1, Tom Mitchell’s Machine Learning Book):-

  1. Experiment Generator: Its job is to generate new problem statements at the beginning of every training epoch. In our case, it just returns an empty initial board State.
  2. Performance System: This module takes as input the problem provided by the Experiment Generator & then uses the improved Learning algorithm to produce a solution trace of the game at every epoch. In our case, this is done by simulating a Tic-Tac-Toe game between two players (cloned programs). These Players make move decisions using the current Target function.
  3. Critic: This module takes the solution trace and outputs a set of training examples to be input to the Generalizer.
    Training Examples ←[<Features(boardState) , R >,<>….]
  4. Generalizer: This module uses the training examples provided by the critic to Update/Improve the Target Function by learning the desired weights using the LMS weight update rule at each epoch.

The following is a small video showing the training & testing phases ( BTW: My video editing skills are not good).

Screen Recording of Training & Testing phases of the game

The code for the above implementation can be found at this Github gist: https://gist.github.com/NoblesseCoder/cccf260ecc3e2052a0a1a013a7f7ac54

--

--