The world’s leading publication for data science, AI, and ML professionals.

Exquisite hand and finger tracking in web browsers with MediaPipe’s machine learning models

Get to know this proficient computer vision library in its JavaScript flavor, here focusing on its hand-tracking tool. With this, your web…

Get to know this proficient computer vision library in its JavaScript flavor, here focusing on its hand-tracking tool. With this, your web apps can detect and track 21 points per hand obtaining fluid x, y, z coordinates for each point in real-time. For several hands! Check out also their other ML-based computer vision tools.

Quick links

Introduction

MediaPipe offers cross-platform machine learning-based solutions for Augmented Reality applications using nothing more than the regular webcams of devices. Specifically, it provides various kinds of (face, hand, body, etc.) detection and tracking algorithms which allow programmers to generate stunning visuals. The best is that many of these tools are supported in JavaScript, which means you can add superb features to your web apps and pages.

I focus here on presenting MediaPipe’s hand tracking library for JavaScript, after the recent release of documentation and examples that run very well on my computer and my smartphone. This tool can detect and track hands in a video feed, and for each of them return {x,y,z} coordinates for 21 nodes (the equivalent of 4 joints per finger plus 1 palm).

Hands-on testing

By simply copying and pasting the minimal code provided by MediaPipe, I got it running in 30 seconds on my website. Make sure to serve it through https, as it needs to access the webcam. Here is a first proof of it running in Firefox:

First tests with MediaPipe in the browser (here Firefox, but it also ran perfectly in Chrome). Play and see the code at https://lucianoabriata.altervista.org/tests/mediapipe/hand-tracking-with-webcam-simplest.html. Note: To see the models estimated for your hands, scroll down to the canvas, which may take some seconds to display! Screenshot by author Luciano Abriata.
First tests with MediaPipe in the browser (here Firefox, but it also ran perfectly in Chrome). Play and see the code at https://lucianoabriata.altervista.org/tests/mediapipe/hand-tracking-with-webcam-simplest.html. Note: To see the models estimated for your hands, scroll down to the canvas, which may take some seconds to display! Screenshot by author Luciano Abriata.

Using this web page I further tested how well this could detect hands in various poses… as you see, it could get all of them right! Even those where the hand pose intrinsically involves occlusion.

Testing different hand poses, including some that involve quite a bit of occlusion. Composed from screenshots by author Luciano Abriata.
Testing different hand poses, including some that involve quite a bit of occlusion. Composed from screenshots by author Luciano Abriata.

You can run this example yourself in this link: https://lucianoabriata.altervista.org/tests/mediapipe/hand-tracking-with-webcam-simplest.html

Note: To see the models estimated for your hands, scroll down to the canvas, which may take some seconds to display!

Here’s the code. See how little you need to write:

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/hands/hands.js" crossorigin="anonymous"></script>
</head>
<body>
  <div class="container">
    <video class="input_video"></video>
    <canvas class="output_canvas" width="1280px" height="720px"></canvas>
  </div>
</body>
<script type="module">
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const canvasCtx = canvasElement.getContext('2d');
function onResults(results) {
  canvasCtx.save();
  canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
  canvasCtx.drawImage(
      results.image, 0, 0, canvasElement.width, canvasElement.height);
  if (results.multiHandLandmarks) {
    for (const landmarks of results.multiHandLandmarks) {
      drawConnectors(canvasCtx, landmarks, HAND_CONNECTIONS,
                     {color: '#00FF00', lineWidth: 5});
      drawLandmarks(canvasCtx, landmarks, {color: '#FF0000', lineWidth: 2});
    }
  }
  canvasCtx.restore();
}
const hands = new Hands({locateFile: (file) => {
  return `https://cdn.jsdelivr.net/npm/@mediapipe/hands/${file}`;
}});
hands.setOptions({
  maxNumHands: 2,
  minDetectionConfidence: 0.5,
  minTrackingConfidence: 0.5
});
hands.onResults(onResults);
const camera = new Camera(videoElement, {
  onFrame: async () => {
    await hands.send({image: videoElement});
  },
  width: 1280,
  height: 720
});
camera.start();
</script>
</html>

You can also see the core of this code with some basic information at MediaPipe’s site:

Hands

Testing more advanced features in MediaPipe’s CodePen example

MediaPipe just released an example at CodePen that exemplifies various features such as:

  • Changing webcam and swapping video (key for handling applications in various devices like phones, tablets, and laptops),
  • Tracking either 2, 3, or 4 hands (I tried 4 hands and it still works very well!)
  • Tuning detection parameters (although the default parameters work very well in all the conditions I tried).
  • Besides, the CodePen example provides refresh rates in fps on running.

I copied this example into a clean web page at my website. It looks like this when running:

MediaPipe's pen, cleaned up at https://lucianoabriata.altervista.org/tests/mediapipe/hand-tracking-with-webcam.html and running on my smartphone. Screenshot by author Luciano Abriata.
MediaPipe’s pen, cleaned up at https://lucianoabriata.altervista.org/tests/mediapipe/hand-tracking-with-webcam.html and running on my smartphone. Screenshot by author Luciano Abriata.

The original pen is at https://codepen.io/mediapipe/pen/RwGWYJw but you can try it more easily here: https://lucianoabriata.altervista.org/tests/mediapipe/hand-tracking-with-webcam.html

Performance of the hand tracking tool

I got around 25 fps when tracking 2 hands in my laptop, and around 20 fps in my smartphone. None of them is very new (laptop is a 2017 Toshiba with i7 and 8GB RAM, no special GPU; phone is a 2019 Google Pixel 3) and both have lots of programs and browser tabs open. So although there is place for improvement, I’d say the library works pretty well.

With the default parameters, the example ran very well in a variety of lighting conditions. Actually I couldn’t find a situation in which it wouldn’t properly detect my hands!

All of MediaPipe’s tracking tools

As of September 7th 2021 MediaPipe offers not only hand and finger tracking but also face detection and face mesh computation, iris detection, whole-body pose detection, hair segmentation, general object detection and tracking, feature matching, and automatic video cropping. Not all these tools are available (at least as of today) in JavaScript, but it could well happen that they will become available. To know more about the MediaPipe tools available for JavaScript, check this out:

MediaPipe in JavaScript

And to know more about all these tools in all programming languages, check their main website:

GitHub – google/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.

Imagining the range of potential applications

In its websites, MediaPipe anticipates applications to art (see some amazing examples here: https://developers.googleblog.com/2021/07/bringing-artworks-to-life-with-ar.html), communication especially signs-based (with a project ongoing here: https://developers.googleblog.com/2021/04/signall-sdk-sign-language-interface-using-mediapipe-now-available.html), control of prothesis and robotic hands and arms (for ex. https://developers.googleblog.com/2021/05/control-your-mirru-prosthesis-with-mediapipe-hand-tracking.html).

In fact Google’s meet already uses some of MediaPipe’s tools to for example control backgrounds (https://ai.googleblog.com/2020/10/background-features-in-google-meet.html). They are also using this technology to track full body poses (https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html), which could be reutilized by gym apps and who knows if maybe to produce more objective evaluation of gymnastic presentations.

I can also advance applications in human-computer interfaces for augmented reality, where human users can grab objects with their hands to explore them in 3D as if they were physically in their hands. Similar to what high-end devices like the Oculus Quest or MS’s HoloLens provide, but running on any device.

And as a scientist and amateur music learner, I cannot overlook the potential of the hand tracking tool for studying limb motion and coordination in drummers, guitarists, etc. Although the technology may not yet be responsive enough for normal execution speeds, I think it is already good enough for some initial investigations. I will play with this and if I find something interesting then I will let you know with a new post.


Liked this article and want to tip me? [Paypal] -Thanks!

I am a nature, science, technology, programming, and DIY enthusiast. Biotechnologist and chemist, in the wet lab and in computers. I write about everything that lies within my broad sphere of interests. Check out my lists for more stories. Become a Medium member to access all stories by me and other writers, and subscribe to get my new stories by email (original affiliate links of the platform).


Related Articles