Ball tracking in volleyball with OpenCV and Tensorflow

Computer vision and neural networks in SportTech

Published in

Towards Data Science

4 min readAug 25, 2020

Introduction

After the first experience of applying AI in sport, I was inspired to continue. Home exercises are looked like an insignificant goal and I targeted team plays.

AI in sports is a pretty new thing. There are a few interesting works:

I am a big fan of playing volleyball, so let’s talk about the last reference. This a site of one Austrian institute who analyzes games of a local amateur league.
There are some documents to read and even more important — open video dataset.

Volleyball is a complex game with many different aspects. So I started with a small but very important piece — the ball.

Ball tracking is a pretty famous task. Google gives a lot of links but many of them are just a demo. Obviously, recognize and track a big ball of distinguished color in front of a camera cannot be compared with real game ball detection, where the ball is tiny, moving fast and blended into the background.

And finally, we want to get something like that:

Before start let’s notice some detail about the video dataset:

the camera is static and located behind the court
skill level allows to see the ball freely (Professionals hit so hard that it is almost impossible to see the ball without TV replay)
The ball color, blue and yellow, does not contrast much with the floor, unfortunately. That makes all color-based approaches meaningless

Solution

So far most obvious approach — with colors — does not work, I used a fact the ball is moving.
Then let’s find moving objects and pick the ball, sounds easy.

OpenCV contains tools to detect moving object with background removal:

mask = backSub.apply(frame)     
mask = cv.dilate(mask, None)     
mask = cv.GaussianBlur(mask, (15, 15),0)     
ret,mask = cv.threshold(mask,0,255,cv.THRESH_BINARY | cv.THRESH_OTSU)

And such picture:

Transformed into:

In this example, the ball is on top and human brain and eye can easily detect it. How did we decide? Some rules could be deduced from the picture:

the ball is a blob
it is the highest blob on the picture

The second rule does not work well. In this picture, for example, the highest blob is the referee’s shoulder.

But highest-blob approach gives an initial data for further steps.

We can collect these blobs and train a classifier to distinguish the ball.

This dataset looks like that:

In terms of AI — it is a binary classification of color images, very similar to Cats-vs-Dogs challenge.

There are many ways to implement, but the most popular approach is with VGG neural network.

A problem — ball pictures are very small and multiple convolution layers do not fit there. So I had to cut VGG to a very simple architecture:

model = Sequential([Convolution2D(32,(3,3), activation='relu',       input_shape=input_shape),         MaxPooling2D(),            Convolution2D(64,(3,3), activation='relu'),         
 MaxPooling2D(),         
Flatten(),         
Dense(64, activation='relu'),         
Dropout(0.1),         
Dense(2, activation='softmax')       
])
     
model.compile(loss="categorical_crossentropy",   optimizer=SGD(lr=0.01), metrics=["accuracy"])

The model is simple and produces mediocre results: about 20% of false-positives and about 30% of false-negatives.
That is better than nothing but, of course, not enough.

The model applied to the game generates many false balls:

There are actually two kinds of false balls:

they appear in random places in random time
the model consistently makes a mistake, recognizing something else as a ball

Trajectories

As the next step, there is an idea, the ball does not move randomly, but follows parabolic or linear trajectories.

Validation of blob movements on this geometry will cut off random and consistent mistakes.

There is an example of recorded trajectories during one ball play:

Where directed paths are in blue, static — in green, and random are grey.

Only blue trajectories are interesting. They consist of at least 3 points and have a direction. The direction is very important because the next point could be forecasted in case if it is missed in the real stream and no new paths detected.

This logic applied to the clip generates a pretty realistic tracking:

Links

Github repo

Video source

Improved Sport Activity Recognition using Spatio-temporal Context
Georg Waltner, Thomas Mauthner and Horst Bischof
In Proc. DVS-Conference on Computer Science in Sport (DVS/GSSS), 2014

Indoor Activity Detection and Recognition for Automated Sport Games Analysis
Georg Waltner, Thomas Mauthner and Horst Bischof
In Proc. Workshop of the Austrian Association for Pattern Recognition (AAPR/OAGM), 2014