The world’s leading publication for data science, AI, and ML professionals.

Stop running face recognition until you’ve read this

There is a really big problem today using machine learning-powered facial recognition with video. In fact, it can be a show-stopper.

There is a really big problem today using machine learning-powered facial recognition with video. In fact, it can be a show-stopper.

Face Recognition can be a powerful tool for telling you who is in a video. It’s great for tagging actors, politicians, and sports players in the billions of hours of media content that is made available to us every day. It’s also great for finding suspects in security camera footage or locating relatives in old family photos.

Most of us don’t mind the second or two it takes to recognize some faces in a photo, or we’re not bothered by the processing time required for finding people in a piece of video.

But if you multiply this out by hundreds or thousands of assets, you start to see that a significant amount of time and resources needs to be dedicated to processing facial recognition.

With today’s current machine-learning-as-a-service (MLaaS) offerings like Google Vision, Microsoft Azure, IBM Watson etc. you pass all of your video assets to their public API endpoints, and in return get back some metadata about who is in some video and where. The metadata might look like this:

{  
   "faces":[  
      {  
         "key":"Al Roker",
         "instances":[  
            {  
               "start":150,
               "end":150,
               "start_ms":5005,
               "end_ms":5005,
               "confidence":0.6983038746995329
            },
            {  
               "start":480,
               "end":660,
               "start_ms":16016,
               "end_ms":22022,
               "confidence":0.6699914024543004
            },
            {  
               "start":780,
               "end":990,
               "start_ms":26026,
               "end_ms":33033,
               "confidence":0.7077699155373681
            }
         ]
      },
      {  
         "key":"Unknown Face 5abc120a9b25d163",
         "instances":[  
            {  
               "start":60,
               "end":60,
               "start_ms":2002,
               "end_ms":2002
            }
         ]
      }
   }

So the face recognition or celebrity detection model found Al Roker in some video footage with time stamps of where he appears.

But it also found an unknown face. If this is implemented correctly into a UI, the next step should be for the human who sees this to tag that unknown person. That, in turn, should train the facial recognition model.

That last step is a challenge in and of itself with the current MLaaS offerings, but it actually gets worse.

Let’s say that that data represents hundreds or thousands of hours of videos (obviously it would be a lot more data in real life), and let’s say that I have gone through and taught it who all the unknown faces are. I would still have to rerun all of that video with the newly trained facial recognition system to reap the benefit of training all of those new people.

We can flip the problem on its head as well. Let’s say I get a mugshot of a new suspect or convicted criminal. I’d like to see if that person appears in the millions of hours of security footage or bodycam footage I’ve been collecting as a law enforcement agency, and have already run through facial recognition systems. Using today’s MLaaS tools, I’d have to first, train the face recognition model with the new face, then, rerun all those millions of hours of footage through the newly trained model.

These are show stoppers because it just isn’t economical to rerun facial recognition every time you have some new training you want to take advantage of.

This is why my company Machine Box recently released a feature in our facial recognition model Facebox called Faceprint.

The benefit of this feature is that you only ever need to run all of your video footage through Facebox once. After you’ve processed it, you can retroactively apply any celebrity recognition training you want. You can keep updating your model, teaching it with new people, showing it new mugshots, and correcting errors as many times as you’d like, as often as you’d like, and never have to reprocess the video to take advantage of all of that new learning.

Imagine what the means for a second. You can keep applying new celebrity recognition models to your video, without reprocessing it.

This will keep you from having to be locked into a pre-trained celebrity recognition model that may be substandard at the time of processing, or simply may not have all the known people you’d like to include.

With Faceprint, there’s no longer any need to wait until the perfect face recognition model comes along.

How does it work?

Faceprint works by giving you a unique hash of each face detected by Facebox. You can then store this hash alongside the metadata you get back about who the face is. Later, you can teach Facebox new celebrities, upload a state file with pre-trained faces in it, or correct the existing model. Instead of running all the video through Facebox again, you just simply need to give it that hash (or Faceprint), and in return you’ll get a tag from the updated model. You can run this entire operation on your whole database of Faceprints incredibly quickly (much faster than processing the images themselves). You might trigger the task manually when you’ve got a new Facebox state file loaded up, or more frequently when your users have been tagging unknown faces all day to make sure your database is up to date.

This new feature of Facebox will save you tremendous amounts of time and money by reducing the constant reprocessing you have to do. And as you know, Facebox runs on-premises, so you don’t have to ever upload video to the web to take advantage of state-of-the-art face recognition.

What is Machine Box?

Machine Box puts state of the art Machine Learning capabilities into Docker containers so developers like you can easily incorporate natural language processing, facial detection, object recognition, etc. into your own apps very quickly.

The boxes are built for scale, so when your app really takes off just add more boxes horizontally, to infinity and beyond. Oh, and it’s way cheaper than any of the cloud services (and they might be better)… and your data doesn’t leave your infrastructure.

Have a play and let us know what you think.


Related Articles