Applied Computer Vision

Computer vision and the ultimate pong AI

Using Python and OpenCV play pong online

Robin T. White, PhD
Towards Data Science
7 min readAug 5, 2020

--

AI in action (right paddle)

One of my favourite YouTuber’s, CodeBullet, once attempted to create a pong AI to rule them all. Sadly he ran into troubles, not because he isn’t capable but I don’t think his experience at the time had much in the way of computer vision. He is absolutely hilarious and I highly recommend you watch him (parental advisory is advised) if you are at all considering reading the rest of this post. Also he is a genius at what he does. Love you mate. See his video here.

This seemed like a really fun and simple task so I had to give it a go. In this post I will outline some of the considerations I took that may help if you wish to work on any similar project, and I think I will try my hand at a few more of these, so if you like this type of thing consider following me.

The nice thing about using computer vision is that I can just use a already built game and process the images. Having said that, we will be using the same game version as the one CodeBullet was using from ponggame.org. It also has a 2 player mode so I can play against my own AI; which I did, and it was hard…

Capturing the screen

First things first, getting the screen. I wanted to make sure my frame rate was as fast as possible and for this I found MSS to be a great python package. With this I was easily maxing out at 60 fps and comparing this to PIL I was only getting about 20 fps. It returns as a numpy array so my life was complete.

Paddle detection

Working our way in order of simplicity, we need to define the paddle locations. This could be done in a few different ways but I thought the most obvious was to mask the area for each paddle and run connected components to find the paddle object. Here is a snippet of that code:

def get_objects_in_masked_region(img, vertices,  connectivity = 8):
''':return connected components with stats in masked region
[0] retval number of total labels 0 is background
[1] labels image
[2] stats[0] leftmostx, [1] topmosty, [2] horizontal size, [3] vertical size, [4] area
[3] centroids
'''
mask = np.zeros_like(img)
# fill the mask
cv2.fillPoly(mask, [vertices], 255)
# now only show the area that is the mask
mask = cv2.bitwise_and(img, mask)
conn = cv2.connectedComponentsWithStats(mask, connectivity, cv2.CV_16U)
return conn

In the above, ‘vertices’ is just a list of the coordinates that define the masked region. Once I have the object within each region I can get their centroid position or the bounding box. One thing to note is that OpenCV includes the background as the 0'th object in any connected component list, so in this case I always grabbed the second largest object. The result is below — the paddle on the right with the green centroid is the player / soon-to-be AI controlled paddle.

Result of paddle detection

Moving the paddle

Now that we have our output, we need an input. For this I turned to a useful package and someone else’s code — thanks StackOverflow. It uses ctypes to simulate keyboard presses and in this case, the game is played using the ‘k’ and ‘m’ keys. I got the Scan Codes here. After testing that it worked by just randomly moving up and down, we are good to start tracking.

Pong detection

Next up is to identify and track the pong. Again, this could have been handled in several ways — one of which could have been to do object detection by using a template, however instead I again went with connected components and object properties, namely the area of the pong since it is the only object with it’s dimensions. I knew I would run into issues whenever the pong crossed or touched any of the other white objects but I also figured this was fine so long as I could track it the majority of the time. After all, it moves in a straight line. If you watch the video below you will see how the red circle marking the pong flickers. That is because it only finds it about 1 in every 2 frames. At 60 fps this really doesn’t matter.

Pong detection shown in red

Ray cast for bounce prediction

At this point we already have a working AI. If we just move the player paddle such that it is at the same y-position as the pong, it does a fairly good job. However, it does run into problems when the pong gets a good bounce going. The paddle is just too slow to keep up and needs to instead predict where the pong will be instead of just moving to where it currently is. This has already been implemented in the clips above but below is a comparison of the two methods.

Side by side of the two AI options. Left is simple follow, right is prediction of bounce with ray cast

The difference isn’t huge but it is definitely a more consistent win with the right AI. To do this I first created a list of the positions for the pong. I kept this list at a length of just 5 for averaging sake, but more or less could be done. Probably don’t want more otherwise it takes longer to figure out it has changed directions. After getting the list of positions I used simple vector averaging to smooth out and obtain the direction vector — shown by the green arrow. This was also normalized to be a unit vector and then multiplied by a length for visualization purposes.

Casting the ray is just an extension of this — making the forward projection longer. I then checked if the future positions were outside the boundary of the top and bottom area. If so, it just projects the position back into the play area. For the left and right sides, it calculates where that intersection will occur with the paddle x-position and fixes the x- and y-position to that point. This makes sure the paddle is targeting to the correct position. Without this it would often move too far. Here is the code for defining the ray that predicts the future position of the pong:

def pong_ray(pong_pos, dir_vec, l_paddle, r_paddle, boundaries, steps = 250):
future_pts_list = []
for i in range(steps):
x_tmp = int(i * dir_vect[0] + pong_pos[0])
y_tmp = int(i * dir_vect[1] + pong_pos[1])

if y_tmp > boundaries[3]: #bottom
y_end = int(2*boundaries[3] - y_tmp)
x_end = x_tmp

elif y_tmp < boundaries[2]: #top
y_end = int(-1*y_tmp)
x_end = x_tmp
else:
y_end = y_tmp

##stop where paddle can reach
if x_tmp > r_paddle[0]: #right
x_end = int(boundaries[1])
y_end = int(pong_pos[1] + ((boundaries[1] - pong_pos[0])/dir_vec[0])*dir_vec[1])

elif x_tmp < boundaries[0]: #left
x_end = int(boundaries[0])
y_end = int(pong_pos[1] + ((boundaries[0] - pong_pos[0]) / dir_vec[0]) * dir_vec[1])

else:
x_end = x_tmp

end_pos = (x_end, y_end)
future_pts_list.append(end_pos)

return future_pts_list

In the above the perhaps less obvious calculation is determining the intercept of left or right positions for the paddles to target. We do this essentially by similar triangles with the diagram and equation show below. We know the intercept with the x-position of the paddle which is given in boundaries. We can then calculate how far the pong will travel and add that to the current y-position.

Schematic for the calculation of the intercept position for paddle targeting

The paddles, although look straight, actually have a rebound surface that is curved. That is, if you hit the ball with the paddle towards the ends it will bounce as if the paddle was angled. I therefore allowed the paddle to hit at the edges which adds some offense to the AI, causing the pong to fly around.

Conclusions

Although designed for this particular implementation of pong, the same concepts and code could be used for any version — it just comes down to changing some of the pre-processing steps. Of course, another method is to use machine learning through reinforcement learning or just simple conv net, but I like this classical approach; at least in this instance where I don’t need robust generality or difficult image processing steps. As I mentioned, this version of pong is 2 player, and I honestly cannot beat my own AI…

Photo by Brett Jordan on Unsplash

If you any part of this post provided some useful information or just a bit of inspiration please follow me for more.

You can find the source code on my github.

Link to my other posts:

  • Minecraft Mapper — Computer Vision and OCR to grab positions from screenshots and plot

--

--