
Ever struggled with taking photos of animals?

Or have you ever found yourself taking a bunch of photos for your pet…and later going through each photo, manually selecting the bad ones to delete?
Well… the app Picture Purrfect is created to make taking cat photos easy for you! Just upload a short video of your cat (especially when they’re playing or moving around) and the algorithm will auto-select the best frame for you.
Here’s a short demo of the app:
If you are interested in how it works, continue reading! Or you can jump to the end for the full presentation.
There are four parts that I’ll talk about:

Data
The data used to build the algorithm are composed of 120 cat videos, either from the internet (multiple Youtube cat channels) or taken by myself. Each video has a duration of 10–30 seconds. These result in over 2000 frames and each frame is labeled as Good or Bad.

As you can see in the examples above, a Good frame would contain a cat with its face clearly shown. If the cat’s face is blocked or blurry, the frame would be labeled as Bad. But as you can imagine, besides these basic criteria, there is a certain degree of subjectivity depending on who is labeling the frames (in this case it’s me). The ultimate question is, how can we teach the machine algorithm learn how we think? For that, we will need features in order to characterize each frame quantitatively. (If we are able to collect enough labeled frames, we can use the deep learning approach without worrying about the features. Unfortunately, for this project I did not have enough time or the right source to collect massive amount of data).
Feature Engineering
Feature engineering certainly is the main focus of this project, as it plays an important role for the model training. There are three categories of techniques:
- Cat face detection
- Blur detection
- Cat facial features detection
Cat face detection

I use Haar cascade classifier (this blog post has a brief explanation how cascade classifiers work) as a quick filter to sift through all the frames in the given video and keep only those with cats for the remaining (more complex and time consuming) feature engineering steps.
With this detector you obtain the coordinate or the cat face center as well as the height and with of the bounding box around the cat face.
From the position and size of the bounding box, we can further find out whether the cat is the main subject of the frame (distance of face center to frame center and size ratio of face to the whole frame):

Blur detection
We can also use some basic image processing techniques to perform blur detection on each frame. There are many different approaches, but in general all of them are related to how sharp the edges are in an image, i.e. edge detection. I choose to go for Laplacian and Canny filters.
The Laplacian method computes the second derivatives of an image, which measures the rate at which the first derivatives change. We can then see if a change in adjacent pixel values is from a discontinuous edge or from a continuous progression.
Canny edge detection is slightly more complex. It has multiple stages, including noise reduction with Gaussian filter. The resulting image is binary (mostly black but with the detected edges traced with white lines).
The example below shows a frame with clear cat face and another one with blurry cat face. You can see how the frames look after applying the Laplacian and Canny operators:

We can then calculate the variance in the processed frames and determine how sharp or how blurry the original frame is. The calculations can be done for the whole frame or for the cat face only. The ratio of cat face sharpness to the whole frame sharpness should also be considered as a feature.
Cat facial features detection

The first two categories are more related to the macro level of the frame. For the micro level, we are going to identify the cat’s facial features such as ears, eyes and nose. This can be done with a custom trained object detector based on a neural network architecture (YOLOv3 by darknet). Similar to the case of face detection, the center coordinate and height, width of the bounding box around each feature would be returned. This actually gives us a lot of information!
Here is an example of the facial features detection for a cat going from wide awake to dozing off:

If we just look at the eyes, it is obvious that from the cat’s eye height and width ratio we can deduce the eye shape (whether the cat’s eyes are open or if the cat is sleepy).

This is another example of a cat taking a bath:

From the relative position of the cat’s two eyes and nose, we can figure out the cat face angle: whether the cat is looking straight into the camera, or is looking sideways.

Model
After all the preparation (data collecting, labeling, feature engineering), we are ready to train the model! Among a few different classifiers that are tested, Random forest has the best performance in terms of ROC AUC (83.3%). Looking further into the feature importance, it’s not surprising to see that the top features are related to sharpness.

Web App
You’ve already seen the web app demo in the beginning! Here are a few frames selected by Picture Purrfect:

Hope you like it!
Full presentation
Learn more about the project:
Learn more about the app:
Other videos: