Creating a platform for AI competitions in Python

Elaborating upon the most important aspects encountered during the development of a platform for an AI competition.

Gilles Vandewiele
Towards Data Science

--

Last year and this year, I wrote a blog post about my solution for an AI competition which colleagues (Elias, Pieter, Ozan and Cedric) and I created, specifically for first-year engineering students. In this competition, players can write their own agent that plays a game for them against agents of other players. To facilitate this competition, a platform was created that allowed the students to register themselves, upload their code and check the ranking of their agent on a leaderboard. The leaderboard is constantly updated based on results of games, simulated on a periodic basis. This blog post will discuss the creation of that platform, called Hexatron this year, in detail, and is organized as follows:

  1. High-level overview and used technologies
  2. A detailed discussion of our simulator
  3. A discussion of our database technology and data scheme
  4. Match-making and rating the skill of agents
  5. An overview of our front-end

Please note that this is only the second version of our platform. There is definitely room for improvement! Therefore, any kind of feedback is more than welcome and greatly appreciated.

This year we even designed our own logo!

High-level architecture and used technologies

A high level overview of the different components in our system, and how they interact, can be found below:

A (very) high-level overview of the architecture, consisting of three main modules: (i) a front-end for user registration, agent submission and display of our data, (ii) a SQL database with our data, and (iii) a runner that periodically samples two players from the database and simulates a game between them. Arrows depict the flow of data.

We can distinguish 4 large components. The central component is our back-end: an SQL-based data store. Data concerning users and submitted agents is gathered through a front-end and stored in the back-end. Moreover, all the data is displayed through the front-end application. Periodically, a Sampler component extracts pair of users from the database in order to simulate a game between them, in docker containers, using the Simulator component. The pairs are formed in order to try and ensure competitive, exciting matches.

Agent compliance & docker-based simulations

As the games are simulated server-side, and because it is possible to cause harm to your hosting machine with python scripts (imagine shutil.rmtree('/') or exec('sudo smth')), some measures were taken in order to ensure that it would be impossible (or at least very hard) to submit an agent to our system and effectively cause harm to our server. This was realized through some initial checks when submitting code to our platform, and by trying to simulate games in a “sandbox” environment.

Each participant needs to implement a function that accepts a certain game state as input, and outputs a move that the agent wants to make. Each time he or she submits the newly written code our platform, we performed a few checks. First, we checked whether the file size of the uploaded file did not exceed more than a Megabyte. This to avoid users submitting solutions that would allocate too much space on our hard disk and to avoid hard-coding the optimal move for every possible game state. Second, we only allowed for NumPy and certain packages from the standard library to be used. Third, and finally, no top-level execution code was allowed (all code had to be wrapped in functions) and some constructs were forbidden including eval , exec , open , file , subprocess , dircache , and more. If all these checks succeeded, the latest submission of the corresponding user was updated to the new code, which would be used in the next simulation.

The simulations are performed using Docker containers. Each time a new agent is submitted to our platform, a docker container is created for that agent in which a flask server is running. This flask server has a post route, which accepts a game state and returns the agent his move. While Docker containers are not ideally suited as sandbox environments, it does make it much harder to break the system through submitted code.

Back-end technology and data scheme

As back-end technology, we used SQL. While it takes a little bit more of development effort than, for example, a simple MongoDB, it is often a lot more efficient. Our data scheme (which is probably way too complex) is depicted below:

The data scheme used in our SQL store.

The central objects within our back-end are the User, Submission and Game. For each agent a User submits to our platform, a Submission object is created and added to the DB. Moreover, this newly added Submission is used to “overwrite” the user his ActiveSubmission. Our runner periodically samples pairs of User objects and runs a simulation between them. The results of that simulation are stored in a newly added Game object, resulting in the creation of a new Rating object as well that represents the users’ updated rank. While there is no real difference between the ActiveSubmission and Submission objects. We did make a distinction such that we could allow for functionality to switch between submitted agents.

Match-making and skill rating

Two important aspects of the competition is how a ranking of an agent can be determined as accurately as possible and how competitive matches can be ensured as much as possible. Again, existing algorithms and technologies were used to solve these.

In order to rank the agents, we first applied the TrueSkill™ system, developed by Microsoft. TrueSkill is similar to the well-known ELO system (originating from chess), but comes with a few extra advantages such as being suited for games other than 1 versus 1 and naturally handling inactivity of players (whereas in the ELO system, these players would just remain around the same rank). While the mentioned advantages are not really relevant in our scenario (where the games are 1 versus 1 and inactivity does not occur because of the periodic sampling), the TrueSkill system is somewhat more recent and was therefore worth a shot. Unfortunately, we noticed that the TrueSkill rating changes way too slowly, making it hard for people that joined the competition later than others to achieve a high rank. Therefore, we refactored back to the ELO system half-way the competition.

To ensure competitive matches, we used an already existing algorithm called Blossom, which finds the maximum matching in a graph. More formally: given a graph G = (V, E) the algorithm finds a matching M (which is a collection of edges) such that each vertex in V is incident with at most one edge in M and |M| (or sum of weights in our case) is maximized. The problem itself is very similar to the bipartite graph matching problem, which can be solved by applying the Hungarian Algorithm (which was the first thing we were looking into), but the latter creates a matching between two (different) sets of vertices (often representing people and jobs). As an example, assume we have four players within our system. One player (A) has a very strong agent, one player (B) has a very weak agent and the two other players (C, D) have average agents. Then we could represent our leaderboard using a weighted graph. This graph is fully connected, and each vertex corresponds to a player. The weight on each outgoing edge from a vertex corresponds to the match quality between those two players (the more competitive the players are to each other, the higher the quality):

Representing our leaderboard of four players in a graph. The width of the edges depicts the match quality.

The Blossom algorithm would, in this case, return 2 edges such that the sum of their weights is maximized and such that each vertex in our graph is incident to exactly 1 of those edges. This is exactly what we want: each player is matched against another player and the total sum of match qualities is maximized. Of course, this is only possible when the number of vertices is even. In case of an odd number of players, 1 random player is uniformly sampled from all players and removed from the graph.

Displaying everything in the front-end

The front-end was mainly created using Flask. Each possible page from our platform corresponds to a route. When a user surfs to this route, the relevant objects are retrieved from the back-end and provided to a JINJA template which is then rendered. For the interactive parts of our application (such as the slider to filter games within a certain date range and the different visualizations), we used JavaScript and D3. Some screenshots of the application can be found below:

Screenshot of the leaderboard page
Screenshot of the game history page
Creating a new account

Concluding words

So this is it. I’ve now given a quick overview of the aspects and components that required a significant part of our time during the development of our platform. I hope this can be helpful to some people out there that are currently trying to create something similar. As always: if anything is unclear, if there are things you would like me to further elaborate upon, or if you have suggestions or feedback, please do not hesitate to contact me! You can do this either through commenting on this post or by sending me a mail.

Perhaps next year, I can make another post about the changes we made in our third version of the yearly AI competition. See you then,

Gilles, Elias, Pieter, Ozan and Cedric

--

--