The world’s leading publication for data science, AI, and ML professionals.

Interactive Face Recognition Application through Docker

An approach to utilize a camera device, interactive GUI, Docker, GPU and deep learning altogether.

Photo by Daniil Kuželev on Unsplash.
Photo by Daniil Kuželev on Unsplash.

This article will cover how to develop an interactive application that utilizes the framework Face Recognition by Adam Geitgey, to recognize faces from ones camera or webcam device. To create an interactive application we will use Tkinter, and Docker will be utilized to ensure an isolated environment containing all the necessary dependencies. This article can be used as a starting point for your own projects, as it shouldn’t matter which frameworks and libraries are utilized. Having this knowledge of how to create an interactive application that utilizes a camera device, GPU acceleration and deep learning frameworks through Docker has numerous possibilities.

If it is solely desired to run a GUI through Docker, I have created this article that you can follow instead:

Empowering Docker using Tkinter GUI


If you want to run the application immediately, you can download my Github repository, and follow the section "Run Application" near the end of this article.

Reading Guide

Throughout the article, scripts within the application will be covered and afterwards, a guide to run the application will be presented. The article will follow the order below.

Table of Content

  1. Prerequisites
  2. Application Overview
  3. Docker
  4. Tkinter
  5. Camera Device
  6. Computer Vision
  7. Shell Script
  8. Run Application
  9. Conclusion

Prerequisites

This application has only been tested using Linux, however, it should work similarly to other OS where some of the parameters may differ, for instance when running Docker.

For prerequisites, you should have a camera or webcam, Docker, CUDA and CuDNN installed. I have tested the application using:

  • Docker 20.10.8
  • CUDA 11.4
  • CuDNN 8.2.2
  • OS: Manjaro 21.1.2 Phavo
  • GPU: NVIDIA GeForce GTX 1080 Ti
  • Camera device: Logitech Webcam C930e

Directory Structure

The directory structure of this project is as presented below. You can already create these files beforehand or download all the files directly from my Github repository. Moreover, you need to create your own directory in the dataset directory and insert your desired images. In my case, as presented below, I have added four images within the directory Kasper. The more images you add of the same person in different scenarios, the more robust your predictions will become. In regards to the scripts, their content will be covered throughout the article.

Note: The encodings.pkl will automatically be generated later.

The directory structure of the application.
The directory structure of the application.

Application Overview

The application contains a GUI, which has a panel for displaying camera device output. Moreover, a button to activate/deactivate Face Recognition.

The application, where face recognition is currently turned on.
The application, where face recognition is currently turned on.

Docker

To create an isolated environment that installs Face Recognition, OpenCV, Dlib, Python and more, the following code for Docker is utilized.


Tkinter

To create an interactive GUI that can be controlled by the user to enable and disable face recognition, the Tkinter library is utilized. The following code creates the GUI that was previously presented.


Camera Device

In order to fetch images from a camera device and update the Tkinter GUI, the following script can be utilized. In line 7, it uses the function VideoCapture by OpenCV, where the parameter should correspond to your device. The default camera id is usually 0, however, if it doesn’t work, you can try with 1 or -1. In case you wish to utilize video instead, you should be able to replace the device id with a video path, but there might be a few other adjustments required. In line 26, it calls the function itself again after one millisecond.

Computer Vision

In contrary to most computer vision applications where you train a model to classify a desired class by presenting hundreds of examples of the class, with face recognition you can use deep metric learning. With deep metric learning, you can train a model to describe the desired object instead of predicting which class it is. In our case, we use it to provide a feature vector, a 128-dimensional encoding, that describes each face using real numbers. Teaching a model to describe faces instead of predicting a specific person is an advantage since one does not have to retrain the model if a new person is desired to be recognized. Instead, one should simply save the encoding of the new person which is reachable for the model. We will obtain these encodings later in the article. With the framework Face Recognition, we don’t have to train a model from scratch, instead we use the already trained model provided through Face Recognition. If it is desired to further explore the field of face recognition, Adam Geitgey, the author of the Face Recognition framework, elaborates on this topic:

Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning

Dataset & Encoder

For the model to be able to identify faces, a pickle file containing encodings of faces is desired. To achieve this, as previously mentioned in section "Directory Structure", you must create a directory with the name of the desired person inside the directory dataset. Afterwards, the encoder should contain the following code. In this code, we recursively obtain all images for each person inside the dataset directory. By the use of the Face Recognition framework, we locate faces and extract an encoding for each image.

Face Recognition

In order to recognize faces, the following code will be utilized. To summarize the code, it uses an image from our camera device, detects faces, extracts a 128-dimensional encoding for each face and then compares the new encoding with our encoded dataset. For the comparison, it checks the distance between the features of the new encoding and our encoded dataset, and if the distance of a feature is lower than the tolerance parameter, the feature gets a vote. To find the best match, we simply choose the person with the highest number of votes. You can implement a stronger classifier if the simple solution is not sufficient.

More specifically, the VideoStream calls the functionprocess_image for each frame, and this function does all the necessary work to achieve face recognition. In lines 43–45, you can adjust the tolerance parameter for the comparison. The lower the tolerance, the more strict the comparison is. Moreover, in lines 48–51, the highest_vote can be adjusted as well, to either increase or decrease the strictness.

To achieve more detailed information about the Face Recognition functions, refer to the docs.


Shell Script (Warning)

To make life easier, create a shell script that runs xhost and Docker with the following parameters. For xhost, enable access to everyone and then disabling it after detaching the Docker container. In this Docker command, we share the GPU, display to view the GUI, volume for continuous development and webcam device.

Warning: Using xhost and these Docker parameters should not be an issue if run locally. However, if used in production, security should be enhanced which is recommended here.


Run Application

  1. Build Docker image: docker build -t facerecognition_gui .
  2. Make the shell script executable: chmod +x ./run.sh
  3. Run the shell script: ./run.sh
  4. Inside the Docker container, create the encoded dataset (Ensure you have images located in a directory within the dataset directory): python3 encoder.py
  5. Run the application: python3 gui.py
  6. You can now enable face recognition.

Note: To detach/exit the Docker container, press ctrl-D


Conclusion

In this article, you have been introduced to how a GUI, camera device, GPU can be used with Docker. Knowing how to incorporate these, provides numerous possibilities that can be used for both academic and commercial purposes. Moreover, with the same strategy of etc. sharing devices through Docker, you should be able to utilize other libraries and frameworks without having to experience issues.

Thanks for reading

As always, feedback is more than welcome. I have on purpose excluded type hinting to reduce code. However, if you find it useful to have type hinting, then let me know.


Related Articles