📱 Mobile Machine Learning
Using FaceNet For On-Device Face Recognition With Android
Leveraging the powers of FaceNet and Firebase MLKit with Android.
The demand for face recognition systems is increasing day-by-day, as the need for recognizing, classifying many people instantly, increases. Be it your office’s attendance system or a simple face detector in your mobile’s camera, face detection systems are all there. In the case of edge devices, this demand is even more as they serve many purposes like surveillance of a crowd, passengers at an airport, bus stand, and so on.
Today, we are going to create a similar face recognition app, totally from scratch. But wait there’s something special!
Imagine, you are using a face detection system in your office. For the 10 employees, in your office, the system works perfectly fine. But, now you have a friend of yours joining you in your office. The system could fail in recognizing your friend as it is not programmed to do so. So, for every new employee, you need to modify the system and reprogramme it.
What if,
You don’t need to retrain the system! Why can’t you just save the images of the 10 employees as well as of your friend and the app would instantly recognize your friend as well? No need to reprogramme the app or any other system, everything runs on Android! As you have new employees, keep adding their images in separate folders and the app is ready to recognize them.
Till it the end of the story, we’ll be creating such an app!
The GitHub project ->
Prerequisites
In our app, we’ll be using CameraX, Firebase MLKit, and TensorFlow Lite. If you haven’t worked with these libraries before, make sure you have a look at them.
- CameraX : Official Codelab
- Firebase MLKit : Detect Faces with ML Kit on Android
- TensorFlow Lite on Android
A bit on FaceNet
- FaceNet: A Unified Embedding for Face Recognition and Clustering
- FaceNet — Using Facial Recognition System
1. Convert the Keras model to a TFLite model
The FaceNet Keras model is available on nyoki-mtl/keras-facenet repo. After downloading the .h5
model, we’ll use the tf.lite.TFLiteConverter
API to convert our Keras model to a TFLite model.
2. Setting up a Preview and ImageAnalyser using CameraX
To implement a live Camera feed, we use CameraX. I have used the code available in the official docs. Next, we create a FrameAnalyser
class which implements ImageAnalysis
class, which will help us retrieve camera frames and run inference on them.
All our classification code will come in the analyze
method. First, using Firebase MLKit, we’ll get bounding boxes for all faces present in the camera frame (an Bitmap
object ). We’ll create a FirebaseVisionFaceDetector
which runs the face detection model on an FirebaseVisionInputImage
object.
3. Producing Face Embeddings using FaceNet and Comparing them.
First, we’ll produce face embeddings using our FaceNet model. Before, we’ll create a helper class for handling the FaceNet model. This helper class will,
- Crop the given camera frame using the bounding box ( as
Rect
) which we got from Firebase MLKit. - Transform this cropped image from a
Bitmap
to aByteBuffer
with normalized pixel values. - Finally, feed the
ByteBuffer
to our FaceNet model using theInterpreter
class provided by TF Lite Android library.
In the snippet below, see the getFaceEmbedding()
method which encapsulates all the above steps.
Now, we have a class that would return us the 128-dimensional embedding for all faces present in the given image. We come back to a FrameAnalyser
‘s analyze()
method. Using the helper class which just created, we’ll produce face embeddings and compare each of them with a set of embeddings that we already have.
Before that, we need to get the set of predefined embeddings, right? These embeddings will refer to the people whom we need to recognize. So, the app will read the images
folder present in the internal storage of the user’s device. If the user wants to recognize, two users, namely Rahul and Neeta, then he/she needs to create two separate directories within the images
folder. Then he/she has to place an image of Rahul and Neeta in their respective sub-directories.
Our aim to read these images and produce an HashMap<String,FloatArray>
object where the key ( String
)will the subject’s name like Rahul or Neeta and the value ( FloatArray
) will the corresponding face embedding. You’ll get an idea of the process by looking at the code below.
Next step is to compare the embeddings with a suitable metric. We can use either the L2 norm or the cosine similarity metric. Use the metricToBeUsed
variable for choosing L2 norm or cosine similarity. We compute the score for each image. Then we compute the average score of each user. The user with the best average score is our output.
The predictions
array is then supplied to the boundingBoxOverlay
class which draws the bounding boxes and also displays the label. In the BoundingBoxOverlay.kt
class. Here, we are using two Matrix
to transform the output and display it on the screen. For the front camera, we have to flip the coordinates of the bounding boxes, else we’ll see a mirror image of the boxes on the overlay.
The Results
Using the app, I have tried to recognize the faces of Jeff Bezos and Elon Musk,
Also, I had stored in the images in my internal storage as such,
More Resources
The End
I hope you liked the story. Thanks for reading!