📱 Mobile Machine Learning

Using FaceNet For On-Device Face Recognition With Android

Leveraging the powers of FaceNet and Firebase MLKit with Android.

Shubham Panchal
Towards Data Science
5 min readJun 21, 2020

--

Photo by Harry Cunningham on Unsplash

The demand for face recognition systems is increasing day-by-day, as the need for recognizing, classifying many people instantly, increases. Be it your office’s attendance system or a simple face detector in your mobile’s camera, face detection systems are all there. In the case of edge devices, this demand is even more as they serve many purposes like surveillance of a crowd, passengers at an airport, bus stand, and so on.

Today, we are going to create a similar face recognition app, totally from scratch. But wait there’s something special!

Imagine, you are using a face detection system in your office. For the 10 employees, in your office, the system works perfectly fine. But, now you have a friend of yours joining you in your office. The system could fail in recognizing your friend as it is not programmed to do so. So, for every new employee, you need to modify the system and reprogramme it.

What if,

You don’t need to retrain the system! Why can’t you just save the images of the 10 employees as well as of your friend and the app would instantly recognize your friend as well? No need to reprogramme the app or any other system, everything runs on Android! As you have new employees, keep adding their images in separate folders and the app is ready to recognize them.

Till it the end of the story, we’ll be creating such an app!

The GitHub project ->

Projects/Blogs from the author.

Prerequisites

In our app, we’ll be using CameraX, Firebase MLKit, and TensorFlow Lite. If you haven’t worked with these libraries before, make sure you have a look at them.

A bit on FaceNet

1. Convert the Keras model to a TFLite model

The FaceNet Keras model is available on nyoki-mtl/keras-facenet repo. After downloading the .h5 model, we’ll use the tf.lite.TFLiteConverter API to convert our Keras model to a TFLite model.

Converting a Keras model to TFLite.

2. Setting up a Preview and ImageAnalyser using CameraX

To implement a live Camera feed, we use CameraX. I have used the code available in the official docs. Next, we create a FrameAnalyser class which implements ImageAnalysis class, which will help us retrieve camera frames and run inference on them.

Setting up FrameAnalyser class.

All our classification code will come in the analyze method. First, using Firebase MLKit, we’ll get bounding boxes for all faces present in the camera frame (an Bitmap object ). We’ll create a FirebaseVisionFaceDetector which runs the face detection model on an FirebaseVisionInputImage object.

Implementing the FirebaseVisionFaceDetector.

3. Producing Face Embeddings using FaceNet and Comparing them.

First, we’ll produce face embeddings using our FaceNet model. Before, we’ll create a helper class for handling the FaceNet model. This helper class will,

  1. Crop the given camera frame using the bounding box ( as Rect ) which we got from Firebase MLKit.
  2. Transform this cropped image from a Bitmap to a ByteBuffer with normalized pixel values.
  3. Finally, feed the ByteBuffer to our FaceNet model using the Interpreter class provided by TF Lite Android library.

In the snippet below, see the getFaceEmbedding() method which encapsulates all the above steps.

Implementing a helper class for FaceNet

Now, we have a class that would return us the 128-dimensional embedding for all faces present in the given image. We come back to a FrameAnalyser ‘s analyze() method. Using the helper class which just created, we’ll produce face embeddings and compare each of them with a set of embeddings that we already have.

Before that, we need to get the set of predefined embeddings, right? These embeddings will refer to the people whom we need to recognize. So, the app will read the images folder present in the internal storage of the user’s device. If the user wants to recognize, two users, namely Rahul and Neeta, then he/she needs to create two separate directories within the images folder. Then he/she has to place an image of Rahul and Neeta in their respective sub-directories.

Our aim to read these images and produce an HashMap<String,FloatArray> object where the key ( String)will the subject’s name like Rahul or Neeta and the value ( FloatArray ) will the corresponding face embedding. You’ll get an idea of the process by looking at the code below.

Reading and generating embeddings for images present in the device storage.

Next step is to compare the embeddings with a suitable metric. We can use either the L2 norm or the cosine similarity metric. Use the metricToBeUsed variable for choosing L2 norm or cosine similarity. We compute the score for each image. Then we compute the average score of each user. The user with the best average score is our output.

A snapshot from README.md.

The predictions array is then supplied to the boundingBoxOverlay class which draws the bounding boxes and also displays the label. In the BoundingBoxOverlay.kt class. Here, we are using two Matrix to transform the output and display it on the screen. For the front camera, we have to flip the coordinates of the bounding boxes, else we’ll see a mirror image of the boxes on the overlay.

Displaying the bounding boxes and labels.

The Results

Using the app, I have tried to recognize the faces of Jeff Bezos and Elon Musk,

The working of the app.

Also, I had stored in the images in my internal storage as such,

File Structure

More Resources

The End

I hope you liked the story. Thanks for reading!

--

--