Many different fish species travel vast distances every year to reach their breeding grounds. Nowadays this journey is made more difficult due to obstacles such as water locks. One of these water locks placed along a popular route for migrating fish is the Weerdsluis in Utrecht. In order raise awareness, an initiative was launched together with the municipality of Utrecht: the Fish Doorbell. Here users could watch a livestream of the waters at the lock, and ‘ring’ a bell if they spotted a fish. The lock could then be opened to let any waiting fish through. The initiative quickly went viral and the doorbell was pressed over 100.000 times!

Despite the fact that this was obviously a campaign to raise awareness, we started wondering if there was a way to automate the detection of these fish, so they could get through even if no one was watching the stream. We created a solution using image processing methods and deep learning to do just that.
Our solution
So how exactly can we detect these fish? we came up with a solution consisting of two steps:
- Using conventional image processing to extract patches from stills of the video that contain movement
- Using a fast and simple Convolutional Neural Net (CNN) to determine if an image patch contains a fish or not.
Image processing
The times where a fish is visible only make up a small portion of the livestream. We want to collect image patches of potential fish that swim by and we want these patches to contain as much useful information as possible. To do this, we decided to implement a background subtraction method to detect motion and obtain image patches that contain fish without background. How we did this is explained in section "Data collection". The retrieved image patches are then resized to a resolution of 100×100 pixels, and fed into the next part: the CNN.
The network
The model should be able to work in real-time in order to work with the livestream. We also prefer the network to work with only limited training data. We opted to use a simple CNN for binary fish classification. The network consists of only two convolutional layers followed by two fully connected layers. The resulting network architecture can be seen in the figure below. In some testing we found that adding more layers did not have a significant positive effect, although but there is still room for optimisation.

Gathering and annotating data
Since we are approaching this as a supervised learning task, we need some annotated data to train our network. Since a public annotated dataset of fishes swimming through the Weerdsluis unsurprisingly did not yet exist, we had to create our own.
Data collection
We read the livestream directly from visdeurbel.nl, for this we used OpenCV and the livestream url extracted from the website. Using a Exponentially Weighted Moving Average (EWMA), we obtain a moving background image of the stream as seen in the top left of the image below. This background image will not contain fast moving objects like fish and will change with lighting conditions during the day.

Next, we perform a simple numeric subtraction of the background image on the latest frame of the video and amplify the result five times to obtain a image containing only moving objects as shown in this video.
Finally, we apply filters on the image and perform a threshold operation, resulting in a mask containing the contours of moving objects. If the contour exceeds a certain size and can be tracked for 20 frames, we save a frame of the video together with a 100×100 pixel patch of the subtracted image. In a span of only 2–3 days, we collected pictures of over 7000 potential fishes.
Data annotation
Since annotating thousands of image patches did not sound that appealing to us, we wanted to make it as easy as possible. For this we wrote a separate script: annotator.py. This script loads the image patches that have not been annotated yet and then allows the user to label them with a single key press. It also counts the total amount of annotations, amount of annotations in the current session and your average amount of annotations per minute, sometimes reaching more than 55 annotations per minute. We distinguish two classes: fish (±30%) and no-fish (±70%). We can also classify images as unclear, after which they are ignored during training. We ended up annotating around 3000 samples of which 2500 are clear and used for our network.

Training the network
We built the network using TensorFlow 2 and Keras. We use Adam as the optimizer and Binary CrossEntropy Loss as our loss function. We use a train-test split of 80/20. Training the network is really fast and only takes about a minute on a ordinary laptop. We did some short experimentation for what hyperparameters work the best for our problem. In future work we would like to perform more extensive hyperparameter optimisation in order to achieve optimal results.
Results
The plot below shows the accuracy curve of the training and test dataset over the amount of training epochs. It is curious that the test accuracy curve flattens after 20 epochs but does not decrease further as the model overfits. For the best model we use early stopping after approximately 10 epochs.

Here the loss of the model is plotted. As the training loss settles close to 0, the test loss keeps increasing. Our hypothesis is that the network tries to get the training loss as close to 0 as possible, but in doing so predicts wrong predictions more and more wrong. This does not negatively impact the accuracy, but does increase the loss for such a prediction, increasing the test loss overall.

Here the ROC curve is plotted. The model achieves high True Positive Rate (or recall), even at a relatively low False Positive Rate. This means the model is able to detect fish extremely often, while having a very low chance of classifying objects as a fish when they are not. This is especially important for this application since we do not want the water lock to be opened for no reason.

Status of the project
Our plan with this project is to have the model classify fish in real time. Up to this point however, we have only been able to test the model on image patches that are collected the same day. The moment we wanted to implement the model with the livestream we found out that the fish Doorbell had just gone offline until spring of 2022, and we can no longer demonstrate the model with the livestream. We hope we will at some point be able to show the working of the model with video data. If that is the case we will definitely update this blogpost with our results.
Simon has already compiled a list of major and minor changes to make for Fish-Net v2, with better background filtering, data collection value embedding etc. so be sure to listen back for more next year.
Contact
Simon van Eeden ~ 5185734 [email protected] Wouter de Leeuw ~ 4487753 [email protected]
For access to the sourcecode used in this project, please send an email to: [email protected]