You never get bored playing with Computer Vision

…and how to make Dino run alone regardless of the platform

Published in

Towards Data Science

8 min readMay 13, 2020

In this article, you will learn how to record a screen with a decent frame rate by using Python and MSS. How to use template matching and edge detection with OpenCV. And if you wish you can make your machine to play a game.

Intro

I like automation and once I read a review written by Markus Rene Pae about yet another python library PyAutoGUI. The library allows you to manipulate by OS inputs like emitting a mouse or keyboard events. One of the challenges Markus proposed was to automate the Google Dino Game. And I was curious does PyAutoGUI allow me to capture a monitor in realtime, find Dino and perform a jump when it’s needed? I decided to give it a try and do not stick to a browser implementation, so Dino should run regardless is it a browser or standalone app. Later in this article you’ll find out for which tasks PyAutoGUI works well, for which it’s better to use other techniques.

Which libraries did I use in the end to make Dino run alone

PyAutoGUI
OpenCV
Python MSS (Multiple Screen Shots)
And of course NumPy, Matplotlib and Jupyter

What is CV (Computer Vision) and OpenCV in a nutshell

Computer vision is a pretty hot topic nowadays. It is used in many places, where images or videos should be processed for future usage. For example Face ID: before understanding that it is you, first it tries to detect a face, and then process the picture and ask ML (Machine Learning) model to classify if it is you or not. Maybe it is someone else who is trying to unblock your phone. One of the most popular CV library at the moment is OpenCV. According to the official web site: the library has more than 2500 optimised algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. OpenCV is written on C++, available on all platforms, and uses APIs written on C++, Python, etc. It accelerates calculations on your GPU if you have one which follows CUDA or OpenCL standards.

My attempt to automate the game with PyAutoGUI

Screenshot from Chrome browser with no Internet connection

First I implemented a game loop and tried to use only PyAutoGUI for capturing a screen, match a template (in my case it is a Dino) and technically it worked, but… The screenshots in PyAutoGUI are not intended for realtime capturing. So I got a latency about ONE second between frames. It’s too much because Dino runs with a speed of more than 400 pixels per second. And when my program pressed a “jump” key the game was over. I decided to specify which area to capture each time to mitigate the latency and got a latency of around 0.4 seconds. Better but still not enough. I understood that I need something else to perform object detection and all calculations should happen ant least with 30 FPS (Frames Per Second). It means I need to perform my calculations and all side effects within 0.03[3] seconds.

MSS jumps into the game

First, what is MSS?
According to the docs MSS is an ultra-fast cross-platform multiple screenshots module in pure python using ctypes. The API is easy to use, it is already integrated with Numpy and OpenCV. Why did I choose MSS? Basically if you want to capture the whole screen MSS does it fast, much faster than other libraries. If you need to cast your screen somewhere, I’d go with this library.

After experimenting with different libraries that can provide a screenshot functionality, I understood that most of them use the same approach of doing it. Each time you grab the screen a “connection” to the screen resources reestablishes. I didn’t dig into this part too deep so far, I can only say that we spend too much time on this reestablishment. Meanwhile MSS is optimised for any OS. When you grab a screen it uses XGetImage method from already created “connection” to your screen resources. That means, you can init MSS instance with with statement and make your game loop there and you get much better performance.

with mss.mss() as sct:
  while True:
    screen = sct.grab(monitor)
    process_image(screen)
    if trigger_to_leave:
      break

Yep, that simple. This way you speed up grabbing a screen hundreds of times. Here I achieved getting screenshots with 100 FPS, I even added a sleep there to reduce redundant calculations. Next we need to process the image, analyse all blocks, and “jump” when it’s needed.
I split this into two parts:

Find a dino on the screen and detect an area with “obstacles”
Using the area from the previous step, grab it in a loop, calculate the distances to the obstacles, and calculate the velocity.

Let’s review these steps.

Find a dino on the screen and detect an area with “obstacles”

This part is visually represented in a Jupyter Notebook on my GitHub: https://github.com/dperyel/run-dino-run/blob/master/search.ipynb

At this point I widely use OpenCV for image processing, template matching, edge detection. First I eliminate the color channels from the images and use only 1 channel by transforming the image with cv2.cvtColot(img, cv2.COLOR_BRG2GRAY). Then I need to remove distinctions between Day and Night.

Day from the left and Night from the right

What can we do for this? Actually, there are many ways to approach, I decided to use a Canny algorithm to detect edges and use extreme values for max and min thresholds. It allows me to get pretty much the same picture for both day and night canvases.

Of course if an image has a lot of noise, I’d need to blur it first, but in this particular case it is enough just to find edges. And I’m good to use template matching to find the Dino. The only thing is that template won’t scale during the match. Our Dino template should be taken from the screen where the game will be played. Or you can extend this functionality and perform template matching with a template scaling.

By using cv2.matchTemplate I get the location of the match. Initially you get a bunch of locations because when OpenCV slides the template over the source image it compares the area and you get a matching value. The matching values represent how precise the pixels matched. In my case I’m looking for only 1 Dino, which means I can take the highest value and use a location that is mapped to the value.

match_res = cv2.matchTemplate(canny_night, dino_bordered_template, cv2.TM_CCOEFF_NORMED)
_, max_val, _, max_loc = cv2.minMaxLoc(match_res)

The max_val for me on average is 0.81, which means my template was matched on the image on 81%. Good enough to continue.

Knowing a dino location we can highlight the area where the obstacles appear. I select the part where no other noise but only obstacles are visible. It is needed to group barriers like cactuses and birds.

The blue rectangle represents the area which I need to review on each frame

Area to watch after. The image is post-processed with edge detection.

To make the grouping wasn’t too hard, as essentially, I have a matrix (image) where each cell has one of two values 0 or 255, kind of not normalised binary matrix. And I need to find the positions of the groups from the left where at least one pixel with value 255 exists. To do this I iterate through the canvas by X-axis with a step that defines a minimum distance between the “obstacles”. Each group represents a tuple with a position from the left and width of the group. When I find all groups, I trim the findings from the left to know the exact edge of an “obstacle”. It is needed for the future part which simplifies an optical flow. Also worth to mention that because a step value is constant the complexity of this approach is linear O(n+k) where n is the width of the canvas and k is the amount of the “obstacles”. And that’s not too bad because I need to do this calculation on each frame and care about performance here. Below you can see a visual representation of how the grouping works.

Visual representation of how the “obstacles” are grouped. Source: Author

And now I have everything to switch to the next step.

Making Dino run and jump with velocity calculation

The running script which finds a dino and starts the game loop is located in the next file: https://github.com/dperyel/run-dino-run/blob/master/run.py

Alright, I’d say that the most complicated part is done in the first part. Now we know at least the distance to the first “obstacle” and can use pyautogui.press('space') if the “danger” object is too close. The problem is that the game changes a speed. Dino runs faster and faster. My first idea was to use optical flow and Lucas-Kanade algorithm in particular to compare the previous frame and current frame. When I get pixel deviations I can calculate the speed. It would work, the only thing is I already have everything I need. My “obstacle” groups represent the features I need to look after and I can store a state from the previous frame to find the deviation I need. I always feel relief when avoiding usage of a complicated algorithm and get the same result by doing a couple of “plusses and minuses” (-:

By knowing a velocity it is a matter of math (or time) to find a dependency on which distance from an “obstacle” do you need to trigger a “jump”. And here is a result.

As a result, Dino runs day and night, and jumps if it’s a cactus or a bird coming closer. Source: Author

Conclusion

Computer vision is a great part of many automation processes. As you can see in this small example you can even make a complete End2End test for a game by letting a machine to find the sensitive parts.

It is a good idea to try different libraries to compare performance. In my case MSS is an absolute winner in screen capturing when PyAutoGUI is still used for other side effects.

P.S.: All sources are laying on my GitHub https://github.com/dperyel/run-dino-run
The repo uses git LFS to store all binary files. To make the script work, you need to take a screenshot of the assets/dino_crop.png from your monitor, or make a template scaling (-;
The code might contain bugs as it was done mostly like a POC.
You are welcome to comment or ask questions below.